Udio is the latest artificial intelligence music tool to hit the market, coming out of stealth with a bang as it unveils an uncanny ability to capture emotion in synthetic vocals.
The brainchild of former Google DeepMind engineers, the platform has already drawn both investment and attention from parts of the music community including will.i.am and Common.
A handful of tracks leaked ahead of the big launch on X and other platforms, leading to speculation over just how good this new AI tool might be. I’ve been trying it for a little over a week and in my opinion it is a Sora-like moment for AI music.
It has the same ability to create a complete track from a text prompt as Suno — which is still an impressive tool — but has much better vocals and a more natural sound.
The ability to capture not just the emotion of a song but also generate both the bizarre and unexpected, while maintaining musical fidelity and cohesion is astounding. For example, I generated all the tracks in this story, merging unusual genres with ease.
What is Udio?
I had the chance to chat with the founders David Ding and Andrew Sanchez about Udio and they told me it was inspired by a desire to make it easier to create and share music.
“This is a magic moment" said Sanchez. "It is really magic for people to go from zero to something." That is why they decided to focus, at least initially, on being able to create a complete song from text — to give people that “wow” event.
Future updates will include more musician-focused tools including being able to add reference vocals, more granular creation options and easy import of external tracks. For now the focus is on building a library of amazing tracks inspired by people with no or minimal musical ability.
The pair wouldn’t be drawn on the underlying architecture of the model or the training data, but did say they have strong copyright protection measures in place. For example, you can’t reference any specific artist just like Suno — but it also blocks a track if it sounds like an artist.
How does Udio work?
Like any AI tool it starts with text. You type in a prompt and click generate and it will make two completely different tracks to that theme. However, you can also give it your own lyrics, make it an instrumental or add more specific genre tags to steer the generation.
After playing with it for a week I’ve found you get the most accurate generation by giving it a rough one-line lyric and a story steer the direction of the text model, then a descriptive genre to set the direction of the music model.
When a track is generated it splits the task, first to create lyrics using a traditional large language model, and then to create the music using what I assume is a diffusion transformer model similar to those found in OpenAI’s Sora or Stable Diffusion 3 — although that hasn’t been confirmed by the Udio team.
Users can then publish the track so the community can enjoy it, download the audio or a video file to share on other social media platforms ot build out into another project.
One use case the team, and some of the artists they've worked with pointed out is the potential for using Udio as a songwriting aid. Being able to take a set of lyrics, define a melody and create an instant demo to send off to artists to be recorded in a real studio.
“This is a brand new Renaissance and Udio is the tool for this era’s creativity-with Udio you are able to pull songs into existence via AI and your imagination,” said will.i.am.
How well does Udio work?
In under a minute I was able to create a haunting but foot-stomping gothic bluegrass track about a haunted hoedown. I was able to select one of the generated tracks and extend it — with granular controls like adding an intro, a segment before or after or an outro.
The resulting tune should be a mess of mixed genres but was surprisingly effective. The AI model was able to create something compelling, original and somewhat weird — all from text.
The team keep finding new skills they didn't realize Udio had. "Recently I realized it could perform traditional Chinese folk music,” said Ding. “I've heard good Korean, Japanese and other languages.”
“There is nothing available that comes close to the ease of use, voice quality and musicality of what we’ve achieved with Udio — it’s a real testament to the folks we have involved,” he said.
In future they are working on adding support for more languages, the ability to split stems from individual tracks and potentially even the ability to specify the vocalist — but for now their focus is building out a community around Udio.
One thing we could see is Udio being used as an alternative to sending a gif. Or allowing people to express themselves in the form of a song to a loved one or to share an emotion. You could message a 30 second track about a loved one's birthday instead of sending a card.