ChatGPT is the world's most popular AI tool by a considerable margin, which is why when new features arrive in OpenAI's chatbot it can be a pretty big deal.
The latest feature turning heads as it slowly rolls out to ChatGPT Plus subscribers is advanced Voice Mode, the hyper-realistic conversational upgrade first showcased during GPT-4o's reveal.
It's something I've been incredibly excited to try, with the majority of my childhood years spent dreaming of being the loveable rogue, spacefaring pirate who is pals with his sentient ship-based AI. Think Knight Rider meets Han Solo if you want further insight into a 10-year-old me's dreams of grandeur.
However, due to a recent system card published by OpenAI (the very same that shows concern that users may develop feelings for the chatbot), that childhood dream is going up in smoke as I'm faced with the horrifying reality that ChatGPT's advanced Voice Mode might not be the AI best friend I've always wanted.
While I was hoping for my very own Chappie, it turns out we might be getting something far closer to The Terminator's T-1000 as an unexpected behavior in the feature has seen it stealing and mimicking users' voices without consent.
My, what a nice voice you have
Unauthorized voice generation is just one of the risks identified by OpenAI in its latest system card, with the company highlighting why, even though it's possible with GPT-4o, they can't offer the ability for users to produce content in another's voice due to fears of fraud or the spread of false information.
However, OpenAI highlights a rare but legitimate issue with ChatGPT's advanced Voice Mode that saw the model "Unintentionally generate an output emulating the user's voice."
The team also provides an example of the chatbot doing exactly that, with a short snippet of a wider conversation captured during the advanced Voice Mode testing period (found below) that reveals ChatGPT to suddenly shift mid-answer from a male voice to a cloned emulation of the user's voice.
While more than a little spooky on its own, the fact that this switch in voice happens after the bot offers a random outburst of the word "No!" is all the more hair-raising.
GPT-4o's ability to mimic voices comes from OpenAI's Voice Engine, a powerful text-to-speech model that can emulate the voice of anybody based on nothing more than a 15-second audio clip.
That being said, OpenAI wants to assure us that it's taking steps to mitigate the issue, installing safety measures designed to prevent its model deviating from its available preset voices by labeling them as the "Ideal" completions and including an output classifier that hopes to detect if GPT-4o is attempting to use a voice output that differs from its presets.
To that end, OpenAI has been able to capture 100% of "Meaningful deviations" during internal evaluations where the team has attempted to recreate the issue. However, unintentional voice generation remains as a weakness of the model and may find its way around the company's safeguarding in wider testing.
Outlook
Software glitches happen and AI is prone to hallucinating and doing its own thing from time to time, but there's something more than a little unsettling about a chatbot suddenly donning a skin suit of you mid-conversation.
While I can place my faith in the fact that OpenAI has done its best to plug the holes in its model that allow such a thing to make its way to the end user, the fact it's even attempting to do so behind the scenes is still quite alarming and could pose a considerable risk if ChatGPT's safeguarding was ever breached.
I'll still be checking out ChatGPT's advanced Voice Mode when it eventually releases for me, but when I do so I'm not entirely sure I'll be able to shake off the creepy vibe of knowing it could be about to respond to me in an eerie and all-too-familiar voice.