OpenAI had plenty of eyes on it earlier this week when it unveiled ChatGPT-4o to the world, and while some of its features are still rolling out, we've certainly been impressed so far.
GPT-4o (the 'O' stands for 'Omni') brings a huge shift to the chatbot, adding a natural-sounding voice that has almost lifelike emotion behind it. And now, another AI firm has demonstrated how the GPT-4o model's Voice Mode can be used to synthesize a digital human.
Taking to X, Victor Riparbelli, Co-Founder of AI video engine Synthesia, said "GPT-4o voice mode is really impressive."
"We gave it a face with @synthesiaIO EXPRESS-1, our latest avatar model. When empathy is important — healthcare, coaching, education — a friendly face really makes a difference," he added, pointing to the importance of video call platforms like Zoom over traditional voice-only calls.
Putting a face to an LLM
We've previously covered Synthesia, which generates a sort of avatar of a user that's more than a little unsettling.
In Riparbelli's example, Synthesia generates a woman in a red shirt, with lip sync pretty spot-on to what ChatGPT-4o's voice mode is saying.
The video does cut away at one point when the demo pans around the room, so it's hard to say if the AI avatar is playing along by looking around the room, but when explaining to the model that the demo is to showcase its abilities, it does a neat sort of "head tilt" in surprise.
If you want to check out GPT-4o, it's rolling out right now. Here's how to get access — although be prepared to wait if OpenAI hasn't put you on the list yet.