Voice is the future of human-computer interaction. I've said this several times recently and AI voice company ElevenLabs has a new product that further highlights the power of conversation in getting things done.
ElevenLabs Conversational AI system is a voice bot, setup to feel like you're making a phone call and holding a conversation with it is just like calling a human.
It is fully customizable, letting you select, design or even clone the voice it uses. You can also add your own knowledge base. For example, if you're making a math tutor you could include access to SAT prep guides.
The most useful aspect is being able to set the underlying brain, or language model. You can pick between any OpenAI, Google or Anthropic model or even include your own custom model if you're running a company.
How does Conversational AI work
Conversational AI is here.Build AI agents that can speak in minutes with low latency, full configurability, and seamless scalability. pic.twitter.com/JqBlwVczdXDecember 3, 2024
Unlike ChatGPT Advanced Voice this is not native speech-to-speech. It works like Gemini Live or MetaAI voice — you speak, it turns it to text and sends that to the AI. The AI responds in text and ElevenLabs voices it up using its existing voice models. This happens so fast it may as well be speech-to-speech.
To make this work ElevenLabs engineers had to create a new custom speech-to-text model that could transcribe the user's words fast enough that it wasn't noticeable, it then had to ensure it all worked seamlessly together.
With Conversational AI, ElevenLabs is directly competing with OpenAI's Realtime API offering. These are model systems designed to make it easier for a company or organization to offer voice-based interaction with products. This could be in a call center fielding phone calls or something less obvious like learning products.
One example use case could be in a children's toy, where the model is trained to offer support and feedback in an age-appropriate way.
Creating a voice assistant
Anyone with an ElevenLabs account can create a conversational agent. It comes with four default templates that can be fully customized.
One is a support agent called Eric designed to resolve issues, another is Matilda the math tutor and a third is a travel guide called George with information on most places around the world. The fourth is a video game wizard with a mysterious voice.
You can also create them from scratch and I tried it with a life coach given access to commonly used coaching tools such as habit tracking and goal setting. It uses Gemini 1.5 flash for speed and price reasons.
Making a call to the agent costs 500 credits per minute during development. The starter plan gives you 30,000 credits for $4 per month.
Overall it is a simple process to set up. There is a lot of flexibility in how you build it and your agents will appear in the sidebar of your ElevenLabs account. You can also import Twilio phone numbers and hook it up to your voice assistant.
For fun, I created a customer support agent named Ryan that uses a clone of my own voice. I'm going to see if my Dad notices when I give it a phone number and tell him it's my new work number and to call if he needs tech help.