The rise of AI tools like ChatGPT and Google Bard has presented the perfect opportunity to make significant leaps in multilingual speech projects, advancing language technology and promoting worldwide linguistic diversity.
Meta has taken up the challenge, unveiling its latest AI language model - which is able to recognize and generate speech in over 4,000 spoken languages.
The Massively Multilingual Speech (MMS) project means that Meta’s new AI is no mere ChatGPT replica. The model uses unconventional data sources to overcome speech barriers and allow individuals to communicate in their native languages without going through an exhaustive translation process.
Most excitingly, Meta has made MMS open-source, inviting researchers to learn from and expand upon the foundation it provides. This move suggests the company is deeply invested in dominating the AI language translation space, but also encourages collaboration in the field.
Bringing more languages into the conversation
Normally, speech recognition and text-to-speech AI programs need extensive training on a large number of audio datasets, combined with meticulous transcription labels. Many endangered languages found outside industrialised nations lack huge datasets like this, which puts these languages at risk of vanishing or being excluded from translation tools.
According to Gizmochina, Meta took an interesting approach to this issue and dipped into religious texts. These texts provide diverse linguistic renditions that allow Meta to get a ‘raw’ and untapped look at lesser-known languages for text-based research.
The release of MMS as an open-source resource and research project demonstrates that Meta is devoting a lot of time and effort towards the lack of linguistic diversity in the tech field, which is frequently limited to the most widely-spoken languages.
It’s an exciting development in the AI world - and one that could bring us a lot closer to having the sort of ‘universal translators’ that currently only exist in science fiction. Imagine an earpiece that, through the power of AI, could not only translate foreign speech for you in real time but also filter out the original language so you only hear your native tongue being spoken.
As more researchers work with Meta’s MMS and more languages are included, we could see a world where assistive technology and text-to-speech could allow us to speak to people regardless of their native language, sharing information so much quicker. I’m super excited for the development as someone trying to teach themselves a language as it’ll make real-life conversational practice a lot easier, and help ghetto grips with informal and colloquial words and phrases only native speakers would know.