ChatGPT just got a nifty new Advanced Voice Mode earlier this week, and although it’s only rolling out to a small subset of paying subscribers right now – in alpha testing – we’ve now been treated to various tasters of the feature in action.
These are popping up online, on the likes of YouTube and X, with the lucky, chosen ChatGPT Plus users who have access to the feature showing it off across a range of different tasks. As The Verge reports, these include requests to sing a song in a certain way, or imitate accents, through to tackling the nuances of correct pronunciation in languages.
If you recall, this functionality was actually revealed at the GPT-4o launch a few months back. However, Advanced Voice Mode was delayed over apparent concerns around tightening up safety with the feature, but it’s now here, and very definitely in action as mentioned – with some impressive results to boot.
For example, The Verge points out ChatGPT giving a lesson in the pronunciation of French words to a user on YouTube, where the AI is pretty helpful.
Here’s another example: a request to sing ‘Happy Birthday’ in a ‘soulful blues’ style. Or how about ChatGPT telling some jokes in difference voices (shy, angry)?
ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50 (this blew my mind - it stopped to catch its breath like a human would) pic.twitter.com/oZMCPO5RPhJuly 31, 2024
Finally, check out the above and below posts on X of ChatGPT’s Advanced Voice Mode counting fast, and then tackling regional US accents.
ChatGPT Advanced Voice Mode attempting various US regional accents pic.twitter.com/UvDeQUNHLpJuly 31, 2024
If you’re keen to get in on the action yourself, we’ve been told by OpenAI that all ChatGPT Plus subscribers will get Advanced Voice Mode later this year. The full rollout should be completed by the ‘end of fall’ so everyone should have it by the time December gets here, in theory.
Analysis: 50 shades of cool
If you’ve checked out the above demos – pretty cool, huh? If not, get checking…
There’s some serious attention to detail exhibited in terms of making the Advanced Voice Mode seem more human-like and real – note the self-imposed artificial level of difficulty incorporated into counting to 50 super-quickly, including a pause for breath, a really neat touch.
Or the blues singing excursion, which isn’t just about the actual singing – which is nicely implemented, for sure – but the in-depth explanations of how the singer might approach the song, and natural style and delivery of the AI voice here (and elsewhere). These AI interactions are driven to new heights of realism here, even if there are still wrinkles to be addressed.
In terms of the latter, we weren’t so impressed with the US accents – though this was a big old ask, and they were a little better when the user asked ChatGPT to emphasize them more. And while the AI responses are generally very quick and to the point – and fluid – there’s the odd moment of silence and confusion to be witnessed, when viewing a range of these clips online.
Remember, though, Advanced Voice Mode is still in alpha, and given that, it’s really quite impressive – strikingly good in some scenarios. This could be one of the areas in which AI moves so fast, that it becomes scary…