Meta released MusicGen, an AI text-to-music generator, open source for the public this week, allowing the world at large to make musical mayhem in 12 second installments until their heart's content. Now, Meta has introduced Voicebox, the most powerful AI text-to-speech generation software we’ve seen to date. So powerful, in fact, that you can’t have it – because you can’t be trusted to have it.
Meta did their homework on this one, they know that throwing this software out into the world would cause nothing but mayhem. Not an hour would pass before the internet was flooded with voice clips made by ner-do-wells of the most vitriolic things possible said through the voice of others. No. A tool of this magnitude should be used with incredible responsibility. Locked away tight and used by only the most trusted and reliable of society.
Which is why Mark Zuckerberg wants to use it to make NPCs in the Metaverse sound cool.
What is Meta’s Voicebox?
Voicebox is a state of the art AI model for not just speech generation but speech recording tasks, such as editing, sampling and restyling. The multipurpose generative AI tool is somewhat of a jack of all trades, suited to both converting text to human speech and editing the results. It can remove unwanted noises in recordings, reduce background static, as well as sample and modify existing recordings across six different languages.
While Voicebox, like many generative AI tools, was trained with over 50,000 hours of recorded speech (and transcripts from public domain audiobooks,) Meta have developed a new approach to learn directly from raw audio and an accompanying transcription. This allows Voicebox to better recognise samples fed into it, and for it to better alter specific parts of the recording, without having to regenerate the entire clip.
Introducing Voicebox, a new breakthrough generative speech system based on Flow Matching, a new method proposed by Meta AI. It can synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more.More details on this work & examples ⬇️June 16, 2023
The product of which boils down to producing high quality audio samples that are genuinely representative of how people actually talk to one another in the real world – with Meta ensuring a diverse sampling of speech to accurately apply the same principle to other languages. The results are impressive too, with Meta hosting a selection of them on their recent blog post. I’m not even kidding when I tell you I have a suspicion that Zuckerberg’s voice over might actually be a product of the tool itself.
Meta believes that one day this technology will be vital to help creators and content producers with editing audio tracks, allowing the visually impaired to hear written messages from friends (in their voices,) and allow people to speak any foreign language in their own voice. That’s right, Mark Zuckerberg just oversaw the invention of the Babelfish.
And you can't have it.
Sadly, this isn’t one of the tools Meta feels comfortable about handing out so freely to the public at large. While Meta researchers have developed a “highly effective classifier that can distinguish between authentic speech and audio generated with Voicebox,” the team still feels that there is a “potential for misuse and unintended harm.” No kidding.
While Meta don’t wish to share the final product, they have revealed the steps they took to get there – believing that publicly announcing this technology is something they possess and that they understand the risks and potential harms it poses while working on tools to authenticate real and generated audio to be the most ethical resolution.
And you know what? Hats off to Meta on this one. It is the most ethical thing to do in that situation. While some would say that the most ethical thing to do would be to never develop it in the first place, it’s good to know that Meta are spending their resources on mitigating the damage such a tool could cause if misused. And it’s far better to announce it publically than one day be exposed as hoarding this technology, only for the most suspicious among us to wonder what Meta may have been using it for after all that time in the shadows.
The big Meta AI push is an interesting one to observe, with a genuine diversity of goals being explored all at once.