The rise of the web has brought the earth’s collective knowledge to the fingertips of more than two billion human beings. With just a small query you can access a webpage on a server thousands of miles away in a different nation, or glance at a notice from someone halfway encircling the earth. However what happens if it’s in Hindi or Afrikaans or Icelandic, and you speak only English—or vice versa?
In 2001, Google started providing a supply that could translate eight languages to and from English. It used what was then state-of-the-art commercial machine translation (MT), however the translation quality wasn’t very excellent, and it didn’t improve much in those first hardly any years. In 2003, a hardly any Google engineers chose to ramp up the translation quality and tackle more languages. That’s when I got involved. I was working as a researcher on DARPA projects looking at a fresh approach to machine translation—learning from data—which held the promise of much bigger translation quality. I got a telephone call from those Googlers who convinced me (I was skeptical!) that this data-driven approach might employment at Google scale.
I joined Google, and we started to retool our translation system toward competing in the NIST Machine Translation Evaluation, a “bake-off” among research institutions and companies to build bigger machine translation. Google’s massive computing infrastructure and ability to crunch vast sets of web data gave us strong results. This was a major turning mark: it underscored how effective the data-driven approach could be.
However at that age our system was also slow to run as a practical supply—it took us 40 hours and 1,000 machines to translate 1,000 sentences. So we focused on celerity, and a year later our system could translate a sentence in under a second, and with bigger quality. In early 2006, we rolled outside our first languages: Chinese, then Arabic.
We announced our statistical MT approach on April 28, 2006, and in the six years since then we’ve focused primarily on core translation quality and language coverage. We can immediately translate among any of 64 different languages, including many with a small web presence, such as Bengali, Basque, Swahili, Yiddish, much Esperanto.
Today we have more than 200 million monthly active users on translate.google.com (and much more in other places where you can employ Translate, such as Chrome, mobile apps, YouTube, etc.). Human beings also seem keen to access Google Translate on the go (the language barrier is never more acute than when you’re traveling)—we’ve seen our mobile traffic more than quadruple year over year. And our users are truly global: more than 92 percent of our traffic comes from outside the United States.
In a given day we translate roughly as much words as you’d find in 1 million books. To place it another path: what all the professional human translators in the earth produce in a year, our system translates in roughly a single day. By this estimate, most of the translation on the planet is immediately done by Google Translate. (We can’t speak for the galaxy; Douglas Adams’s “Babel fish” probably has us beat there.) Of direction, for nuanced or mission-critical translations, nothing beats a human translator—and we believe that as machine translation encourages human beings to speak their own languages more and carry on more global conversations, translation experts will be more crucial than ever.
We imagine a prospect where anyone in the earth can consume and share any data, no affair what language it’s in, and no affair where it pops up. We already provide translation for webpages on the glide as you browse in Chrome, words in mobile photos, YouTube video captions, and speech-to-speech “conversation mode” on smartphones. We desire to knock down the language barrier wherever it trips human beings up, and we can’t wait to see what the following six years will bring.
Posted by Franz Och, Distinguished Research Scientist, Google Translate
DOWNLOAD: Dontari Poe