Eleven Labs has done it again. The pioneer in top quality AI generated voice and SFX audio, has just unveiled a new text to sound effects API.
To celebrate the occasion the company also released a very cool open source demo called Video to Sound Effects to showcase what the tech can do. It’s available online and at Github, and it’s pretty awesome.
Just take your generated video, upload it to the ElevenLabs demo webpage, and wait while the platform analyzes the video, and returns a choice of four different sound effect audio tracks to choose from.
Select the version you like and hit the download button to grab the video clip along with the new audio. Super simple. The whole process takes around 5 minutes from uploading a 5 second clip.
This is a new area of AI known as video-to-audio (V2A). Google recently announced a research project promising similar technology but that isn't yet available to try.
Putting ElevenLabs to the test
I tested it out using Luna Dream Machine (LDM) as my video generation tool. I tried five different video prompts with mixed results, but hey, it’s early days. Anyhoo, I eventually succeeded in getting a clip of a gorilla riding a Harley Davison motorbike, and uploaded it to the ElevenLabs demo page.
Within 20 seconds or so I had four audio samples to audition, chose one and started the download process. I have to say that despite some dodgy iterations the final result is actually pretty great. The video is hilarious, and the audio gives it a whole new dimension.
The tech works by sampling 4 frames at 1 second intervals from the uploaded video, which is sent to ChatGPT-4o to create a custom text-to-sound-effects prompt.
The prompt is then sent back to the ElevenLabs API to create the final SFX. It’s crude, but surprisingly effective. The results will never win an Oscar, or indeed a Golden Reels award, but as a quick and dirty way to give some life to a dull AI generated video clip, it works well.
We are excited to introduce the Text to Sound Effects API. To showcase it - we've built the first Video to Sounds Effects app. This app is available for free online and fully open-source. pic.twitter.com/8aalo8GCSoJune 17, 2024
While the demo is clearly aimed at the general public, the new API is aimed at serious business use.
The company is not only targeting sound effects with the tech, but also on-demand samples for music production, and dynamic sound for video games.
To deploy the API, customers will need an ElevenLabs account with an API key, and every generation will cost 100 characters, or 25 characters per second for set durations.