OpenAI has unveiled a new tool that recreates a person's voice with just 15 seconds of recorded audio.
Dubbed Voice Engine, the model takes a single 15-second clip to learn the person's voice and how they speak. From there, users can input text to have it say whatever they want with realistic-sounding voices that include emotion. The company said that it developed Voice Engine in 2022 and has used it in preset voices, but this is the first time it's discussed utilizing a person's actual voice. OpenAI also acknowledged in a blog post on Friday (March 29) the obvious, potentially malicious implications.
"We are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse," OpenAI wrote in a blog post. "We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities."
OpenAI added that based on how those conversations go, it'll decide how — or even if — it releases Voice Engine to the public.
The company wrote, "We will make a more informed decision about whether and how to deploy this technology at scale."
The implications of Voice Engine are huge. While it can be used in various notable ways, like quickly recording presentations or communicating more effectively, it's not difficult to capture someone else's voice and use it for nefarious purposes. Indeed, many of those types of scams exist already, and they're being used to dupe people into sending money and sharing information with scammers.
We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. https://t.co/yLsfGaVtrZMarch 29, 2024
OpenAI argues it's with that risk that getting feedback is so important. The company said it's engaging with governments, media companies, entertainment companies, and educational institutions across the U.S. and internationally to discuss Voice Engine. Those parties are now testing Voice Engine and have agreed not to impersonate others. They must also disclose to anyone listening to the audio that the voice is AI-generated. OpenAI has also added watermarking, so listeners will know the voice isn't authentic.
"We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures," the company said.
Looking ahead, it's unknown what will come of Voice Engine. While it's possible that it'll eventually be made public, OpenAI may also determine it's not in the public's best interest. Either way, the company said, it's clearly possible to develop, and it's clearly here. "It's important that people around the world understand where this technology is headed," the company said, "Whether we ultimately deploy it widely ourselves or not."