OpenAI is giving free users of ChatGPT a sneak peek at its impressive Advanced Voice mode. This limited preview was confirmed during the company's Dev Day event in San Francisco and will give non-subscribers a brief look at how it differs from basic voice.
What makes Advanced Voice different from the free version or even Google's new Gemini Live is that it is native speech-to-speech. This means that instead of converting what you say to text, analyzing that text, and then sending it back as speech, it's just listening to what you say and can understand even nuances in your tone of voice or emotional traits.
I suspect that at some point in the future, OpenAI will more fully roll out Advanced Voice to all users as it manages to refine the underlying models and make them cheaper.
There are even rate limits on how much you can use Advanced Voice as a $20-a-month paying subscriber to ChatGPT Plus. However, I've been using it fairly extensively for more than a month and haven't found myself hitting those limits yet.
Why is Advanced Voice a big deal?
Starting this week, Advanced Voice is rolling out to all ChatGPT Enterprise, Edu, and Team users globally. Free users will also get a sneak peek of Advanced Voice.Plus and Free users in the EU…we’ll keep you updated, we promise.October 1, 2024
It's hard to explain why Advanced Voice is so much better than the likes of Gemini Live or Meta's new AI voice until you actually use it. Gemini Live is very impressive; Google’s engineers managed to capture natural-sounding voices and allow you to interrupt in real-time, but it does lack that extra something special.
For example, I was showing Advanced Voice to my three-year-old son and I told it, "Hey ChatGPT, this is my son, he's three and his name's Theodore," and its voice tone changed immediately to the type of voice that you would use when speaking to a small child. It even addressed him directly and knew when it was me or him speaking.
Another impressive feature is having it change its accent and then lock that accent in its memory so that it continues to speak that way every time. This could include having it talk like Yoda, or a pirate, or a Yoda pirate!
We've also barely seen the potential capabilities of native speech-to-speech because of safety concerns and guardrails placed on the underlying model. You very occasionally see snippets of its true potential when it starts beatboxing or singing. In future, as OpenAI discovers ways to make safety less of an issue, we will start to see more of these capabilities come out.
That was basically just a long way of saying if you've got a free version of ChatGPT and you get a sneak peek of Advanced Voice, give it a go because, like the Apple Watch, you don't know how good or useful it is until you actually use it.