Hume AI, the empathetic voice model company, has just unveiled a demo of an integration between Anthropic’s Computer Use technology, and Hume’s Empathetic Voice Interface (EVI) tech.
The video shared by Hume of the demo working shows a user talking to their computer screen to set up a hands-off chess game with the Hume persona.
The computer sets up the board, invites the user to make the first move, and in the end shows complete mastery of the board, the computer and the conversation as the chess game progresses through three moves.
This all happens with no user input — no keyboard, mouse, or other physical connection apart from some sultry AI voice chat. Voice control of a chess game isn't new but this goes much further than that.
On the face of it, the technology behind the demonstration is fairly well established by now. One model, Claude, handles the computer interactions through its ability to ‘see’ the screen through multimodal training and activates functions as though it were pressing buttons on the keyboard.
The Hume model translates voice to text commands and feeds them to Claude, while also translating the text output of the computer AI into dulcet tones for the user’s ears.
"By integrating Claude with EVI, we've created something truly special. Claude's frontier natural language capabilities and personality complement EVI's expression understanding and empathy, so EVI can “act out” Claude’s responses and generate fluid, context-aware conversations that feel remarkably human,” says Hume co-founder, Alan Cowen.
It sounds deceptively simple, but behind the slick demo lies a huge amount of technology at every point. The Claude - Hume relationship has been developing over a long time, and some of the stats are staggering.
There have been over 2 million minutes of AI voice conversations completed using the integrated models, which in turn has helped deliver a 10% decrease in latency through improved optimization, as well as reducing costs by an impressive 80%.
A new way of using computers
All of these rapid advances in computer-voice communication such as OpenAI’s Advanced Voice mode, Hume and even the open source Whisper tech, are pointing the way towards a future which has long been forecast by Hollywood.
It’s Star Trek meets the Jetsons, meets a dystopian future full of talking teapots and impossibly sensual laser printers. They’re calling it a ‘voice-first’ future.
As Cowen says, "In a few years, voice AI will be omnipresent, serving as the primary interface for human-AI interactions.”
By connecting together the autonomous control functions of Claude and the super fast response of Hume’s expressive voice, we’ve been shown an early glimpse into the possible future interaction between man and machine.
How you feel about that will depend upon your current views on AI, and the fate of humans in a world which still won’t have flying cars sorted out.