In September, Alexis Conneau left OpenAI, where he led the research team behind ChatGPT's voice feature that made for some unflattering headlines. One of the voices it produced bore a striking resemblance to that of actress Scarlett Johansson, who revealed that she had been asked to license her voice for the project and had refused.
Now, Conneau and his fellow cofounder, Coralie Lemaitre, an engineer turned business strategist, are making waves with his new venture, WaveForms, which emerged from stealth today backed by $40 million in seed funding from venture capital firm a16z. WaveForms aims to take Conneau’s audio AI efforts to the next level by using the power of AI voices to unlock what he calls emotional general intelligence, or EGI.
When asked about the Johansson controversy, Conneau insisted that the ChatGPT voice was never meant to mimic Johansson. But he was not surprised that people were reminded of Johansson’s character in Her, a film about a man’s relationship with an AI assistant. “When they see the technology, they think about the movie right away,” he said.
In leaving OpenAI, Conneau cheekily nodded to the debate in a post on X, formerly Twitter, writing, “After an amazing journey at @OpenAI building #Her, I’ve decided to start a new company.”
“Obviously the movie Her has been an inspiration,” he said, but admitted the movie’s depiction of the complex, negative impacts of an AI relationship is “something that we should probably avoid—even though we might have loved the movie, that's not really what we want.”
Instead, Conneau explained that WaveForms' mission is to push the boundaries of audio AI, which he considers the “social-emotional layer” of AGI, or artificial general intelligence, the still-unachieved point at which AI can do certain things at least as well as humans. The goal is to enhance an AI’s ability to understand and respond to spoken language, including nuances like tone, inflection, and accent. “Audio is the first emotional, social emotional layer of AI,” he said.
WaveForms audio LLMs, as Conneau put it, will be able to capture the emotional subtleties of voices (something assistants like Alexa and Siri cannot do, for example), understand the full context of a conversation, and convey complex emotions in return. An audio LLM for teaching, for example, could understand when a student is frustrated and could, in turn, become even more patient.
In fact, keeping the focus on the power of AI and voice is why Conneau left OpenAI. “I really enjoyed my time there, but as far as i’m concerned, companies like OpenAI, Google, and Meta are AGI-focused companies, but what we are talking about here is a different focus” that requires building a new company, he said.
A focus on tackling what three years ago was one of the toughest AI challenges to crack—integrating audio intelligence directly into a large language model—is how Conneau ended up at OpenAI three years ago. Back then he was a researcher at Facebook, and sent an email to his AI hero—OpenAI co-founder and chief scientist Ilya Sutskever. After a dinner at which the two discussed the potential for immersive audio experiences in OpenAI's models, he was hired to start the voice mode project from scratch. “Nobody was working on this at the time because it’s one of the most complicated deep learning projects,” Conneau said.
“I’ve always wanted to work with Ilya,” he added.
Conneau declined to confirm whether Sutskever’s departure from OpenAI in May, after Sutskever played a role in CEO Sam Altman’s brief firing in 2023, was part of his own decision to leave. “Ilya is the greatest scientist we might ever see in AI,” he said. “I think it’s similar to Einstein in AI.”
As he works on Waveforms, Conneau once again emphasized that sci-fi movies like Her are inspiring when it comes to the technology they create—but not necessarily in terms of how it's used.
"Is it going to replace human interaction? Are we going to be obsessed with computer interaction? I don’t think that’s the future that will happen,” he said. “If anything it will be kind of a complementary aspect of our social life and perhaps even reinforce the quality of our human interaction.”