There’s no denying that Google Translate helps us break the language barriers each day. In a recent blog post, Sveta Kelman – Senior Program Manager at Google Translate, introduced 13 new languages that were recently added to the translation service. This gets Translate’s language portfolio to a total of 103 languages that cover 99% of the online population.
“The 13 new languages — Amharic, Corsican, Frisian, Kyrgyz, Hawaiian, Kurdish (Kurmanji), Luxembourgish, Samoan, Scots Gaelic, Shona, Sindhi, Pashto and Xhosa — help bring a combined 120 million new people to the billions who can already communicate with Translate all over the world,” wrote Kelman in the blog post.
How it all works
In order to be added to Translate, a language requires – besides being an actual written language, a significant amount of translations available online. Google’s machine learning takes it from here, and combines open source licensed content with inputs from the Translate Community. Machine learning alone can learn a language to some extent by identifying statistical patterns from existing translated texts on the web, but there’s more to a language than what documents can offer. That’s why the Translate Community is so important to the whole process. Anyone can bring a contribution to the translation of languages, by either translating from scratch, or validating existing translations. Your contribution earns you badges along the way, which elevate a user’s status in the community. It’s mostly symbolic.
“As already existing documents can’t cover the breadth of a language, we also rely on people like you in Translate Community to help improve current Google Translate languages and add new ones, like Frisian and Kyrgyz,” wrote Kelman. So far, over 3 million people have contributed approximately 200 million translated words.”
This community can be impactful especially in the expression department. This is where machine learning can fail. A valuable input from a user can bring improvements. Translation example:
From English: piece of cake
To French [just machine learning]: part de gâteau
To French [with Translate Community]: c’est une affaire simple
The point is that a machine will probably always need human assistance to learn a language, since not everything in a language is completely logical. Google Translate doesn’t even apply grammatical rules, because the algorithms are based on statistical analysis, which is much more effective. The system is based on a DARPA contest-winning research, done by the creator of Google Translate, Franz Josef Och This translation method would often use English as an intermediary language, so if for example you wanted to translate something from you own language (L1), to let’s say French (L2), the whole process would look like this: L1->EN->L2. This is why if you only write a part of an expression, there might be inconsistencies.
Almost 10 years have passed since Google launched the translation service, and with the coverage it reached today, it’s safe to say that its evolution has been impressive.
Fun facts about the 13 newly added languages:
- Amharic (Ethiopia) is the second most widely spoken Semitic language after Arabic
- Corsican (Island of Corsica, France) is closely related to Italian and was Napoleon’s first language
- Frisian (Netherlands and Germany) is the native language of over half the inhabitants of the Friesland province of the Netherlands
- Kyrgyz (Kyrgyzstan) is the language of the Epic of Manas, which is 20x longer than the Iliad and the Odyssey put together
- Hawaiian (Hawaii) has lent several words to the English language, such as ukulele and wiki
- Kurdish (Kurmanji) (Turkey, Iraq, Iran and Syria) is written with Latin letters while the others two varieties of Kurdish are written with Arabic script
- Luxembourgish (Luxembourg) completes the list of official EU languages Translate covers
- Samoan (Samoa and American Samoa) is written using only 14 letters
- Scots Gaelic (Scottish highlands, UK) was introduced by Irish settlers in the 4th century AD
- Shona (Zimbabwe) is the most widely spoken of the hundreds of languages in the Bantu family
- Sindhi (Pakistan and India) was the native language of Muhammad Ali Jinnah, the “Father of the Nation” of Pakistan
- Pashto (Afghanistan and Pakistan) is written in Perso-Arabic script with an additional 12 letters, for a total of 44
- Xhosa (South Africa) is the second most common native language in the country after Afrikaans and features three kinds of clicks, represented by the letters x, q and c
Let’s be honest, Google Translate does have its limitations. But can a machine truly translate a language? Of course not, at least not yet. But if there is something out there that can help you have a basic live conversation with someone from a different part of the world, who speaks another language, I would say that’s pretty impressive. Some languages are hard to fully understand even if you practice them every day for years, because some languages are deeply rooted in history and culture. Machine translation services might never reach total perfection, but Google Translate is getting remarkably close.