Get all your news in one place.

100’s of premium titles.
One app.

Start reading

Get all your news in one place.

100’s of premium titles. One news app.

Start reading

Laptop

Technology

Rael Hornby

You might accidentally fall in love with ChatGPT's advanced Voice Mode

OpenAI Google Microsoft

Retro computer with monitor blowing kiss to user in ASCII.

Everyone loves AI. They're AI-mad. It's all Google Gemini this, Anthropic Claude that, and Microsoft Copilot the other. But you especially love AI. Love, love. Or at least you will if one of OpenAI's biggest fears about ChatGPT's upcoming advanced Voice Mode comes true.

Yesterday, ChatGPT makers OpenAI published the GPT-4o System Card, a preparedness report that measures potential risks that its AI model could pose alongside safeguards the company has put (or will put) in place to mitigate those risks.

While many of the proposed risks and solutions revolve around protecting user privacy, such as ensuring that GPT-4o will not identify people based on speech recordings, sandwiched within the middle of the report is OpenAI's looming fears that its recent advanced Voice Mode feature will result in its userbase anthropomorphizing ChatGPT and forming an emotional reliance on the chatbot.

In human speak, OpenAI is concerned that ChatGPT's human-like advanced Voice Mode is so good that a portion of its users will forget that ChatGPT is a piece of software and end up becoming emotionally attached to its chatbot, not too dissimilar to the movie Her.

ChatGPT: What is the advanced Voice Mode?

Under the heading "Anthropomorphization and emotional reliance," OpenAI highlights the possibility of users attributing "Human-like behaviors and characteristics" to the chatbot, stating "This risk may be heightened by the audio capabilities of GPT-4o, which facilitate more human-like interactions with the model."

GPT-4o is the latest version of the model that powers OpenAI's popular Large Language Model, ChatGPT. This new version was announced in May during the OpenAI Spring Update event, which served to introduce all of the new features and capabilities of this model and preview some of the things expected to feature in ChatGPT across future updates.

One such feature was advanced Voice Mode, which promised to outfit ChatGPT with the ability to engage in hyper-realistic and near-instant, human-like audio responses that could effectively carry out a more natural and realistic conversation with users.

Advanced Voice Mode sees the chatbot display the vocal emotions as well as use non-verbal cues, while even pausing in places to simulate vocal breaks for breathing. It's OpenAI's most ambitious human-computer interface method yet, and it immediately left people stunned after it was revealed — perhaps none more so than Hollywood actor Scarlett Johansson, whose voice bared a striking resemblance to the "Sky" personality used to showcase the capabilities of GPT-4o.

OpenAI's hyper-realistic conversational model may impact human-to-human interactions

With advanced Voice Mode now beginning to roll out to select ChatGPT Plus subscribers, it would seem that OpenAI is still holding on to concerns about how the wider public will react to this new hyper-advanced conversational mode.

In the published System Card report, OpenAI highlights how it observed "Language that might indicate forming connections with the model" during early testing, with users expressing "Shared bonds" by using language such as "This is our last day together."

OpenAI admits that phrases like this could be benign, but they remain vigilant over the bonds users might form after accessing advanced Voice Mode, stating that phrases like this "Signal a need for continued investigation into how these effects might manifest over longer periods of time."

One of OpenAI's concerns is that "Human-like socialization with an AI model may produce externalities impacting human-to-human interactions." The company uses the example that humans forming social interactions with ChatGPT may reduce their need for actual human interaction.

While acknowledging that this may benefit those struggling with loneliness, OpenAI is also quick to point out how it may impact an individual's perception of social norms, citing the potential for people to commit social faux pas where they may adopt the thought that interrupting others in conversation is acceptable and normal because that's one of the ways they can interact with ChatGPT's speech model.

While there appears to have been no alarm bells to ring during advance Voice Mode's testing phase, OpenAI hopes that "More diverse user populations, with more varied needs and desires from the model, in addition to independent academic and internal studies will help us more concretely define this risk area."

ChatGPT's advanced Voice Mode is currently rolling out to select ChatGPT Plus subscribers, with a wider release expected to arrive before the end of the year.