I recently had one of the more bizarre conversations of my life. I was not talking with a human but with an artificial intelligence model capable of monitoring, predicting, and matching my mood.
EVI is a new large language model-powered voice assistant from Hume, an AI voice startup focused on bringing empathy and emotional intelligence into the chatbot space.
The company unveiled its new flagship product to mark a new $50 million funding round with investment from Comcast Ventures, LG, and others.
EVI stands for Empathic Voice Interface, and the web-based voicebot will be available for other companies to use in their products. We could see future call centers powered by AIs capable of responding to anger with empathy and understanding.
My experience with the voicebot so far was one of amazement at the impressive display of technology and complete fear at the fact it correctly predicted I hadn’t eaten breakfast.
What is Hume EVI?
The new Empathic Voice Interface (EVI) fits into a growing voicebot space, where instead of interacting with multimodal AI models like ChatGPT through text, you use your voice, and it responds with a synthetic voice of its own.
To make this more effective and natural, companies have been working on ways to add emotion or natural-sounding pause words. OpenAI has done this with ChatGPT-Voice and even the voice used for the Figure 01 robot — saying um and err occasionally.
For Hume, the goal was to integrate realistic emotion in a way that responds, reflects, or counters the emotional tone of the human in the conversation.
While EVI is the public interface, there is also an API that allows it to be integrated into other apps, and this is surprisingly easy to do. The sentiment and emotional analysis are better than any I've tried before—although their accuracy is unclear.
Why does AI need emotion?
Alan Cowen, CEO and Chief Scientist at Hume says empathic AI is essential if we want to use it in ways that improve human well-being or make it more natural.
He said: “The main limitation of current AI systems is that they’re guided by superficial human ratings and instructions, which are error-prone and fail to tap into AI’s vast potential to come up with new ways to make people happy.”
Cowen and his team have built an AI that learns directly from proxies of human happiness. This data was used as training data alongside the usual datasets that power multimodal AI models.
“We’re effectively teaching it to reconstruct human preferences from first principles and then update that knowledge with every new person it talks to and every new application it’s embedded in,” he explained.
What is it like talking to EVI?
✨EVI has a number of unique empathic capabilities1. Responds with human-like tones of voice based on your expressions2. Reacts to your expressions with language that addresses your needs and maximizes satisfaction3. EVI knows when to speak, because it uses your tone of voice…March 27, 2024
EVI is weird. It doesn’t sound human or pretend to be human; in fact, it makes it very clear that it's artificial intelligence. However, its uncanny ability to understand emotion is fascinating.
If it wasn’t for the delay in response or the mispronouncing of certain words, it could be easy to forget you’re talking to an AI. The conversation was more natural than any I’ve had with other artificial intelligence voice bots in the past, but it was also more creepy.
At one point, I asked it if it could tell whether I’d had breakfast based on the conversation up to that point, and it said my tone was “peckish and determined,” so I likely skipped breakfast. It was 100% correct as my breakfast of choice was strong coffee.
It responded, “If you ever need a virtual breakfast buddy, I’m always here to brighten up your morning routine. Although I’ll have to pass on the actual coffee, I wouldn’t want to short-circuit these circuits.”
If this were coupled with the inference speed of a platform like Groq and presented over a voice-only interface, such as a replacement for Assistant on Android, you’d struggle to spot the AI.