The USA-based tech giant Google has announced the addition of 110 new languages to its Google Translate, the largest increase in its language database to date. Google Translate is a free multilingual neural machine translation service developed by Google, designed to translate text, documents, and websites from one language into another.
One of the new languages is Balochi, which has long been absent from the Google Translate database despite being spoken by tens of millions of people in Balochistan, Pakistan, Afghanistan, Iran, and Gulf countries like Bahrain, Dubai, Oman, and Kuwait.
The addition of Balochi in Google Translate will be seen by many as a welcome and celebratory decision.
The company announced the additions in a blog post on Thursday, saying: “Google Translate breaks down language barriers to help people connect and better understand the world around them.”
The company said it added 24 new languages to Google Translate using Zero-Shot Machine Translation, where a machine learning model learns to translate a new language without ever seeing an example. The company also announced its 1000 Languages Initiative, where it would build AI models that will support the 1000 most spoken languages around the world.
“Now, we’re using AI to expand the variety of languages we support,” the company’s blog post said. “… we’re rolling out 110 new languages to Google Translate, our largest expansion ever.”
The new language additions include Punjabi (Shahmukhi), a variety of Punjabi written in the Perso-Arabic script (Shahmukhi), and is mostly spoken in Pakistan. The additions also include Tok Pisin, an English-based creole and the lingua franca of Papua New Guinea.
Other notable additions include Tamazight (Amazigh), a Berber language spoken across North Africa, and N’Ko, a standardized form of West African Manding languages that unifies many dialects into a common language.
Google said in its blog post that Cantonese, also a new addition, has long been one of the most requested languages for Google Translate. Explaining the reason for its delayed addition, the company said that Cantonese often overlaps with Mandarin in writing, which makes it tricky to find data and train models.