It’s one thing to have AI that can create videos for you, but what if you want them to have sound, too? Google’s DeepMind team now says that it’s come up with some video-to-audio (V2A) technology that can generate soundtracks - music, sound effects and speech - both from text prompts and the video’s pixels.
This is the kind of news that might have soundtrack composers shuffling awkwardly in their seat - all the more so because, as well as being able to work with automatic video generation services, V2A can also be applied to existing footage such as archive material and silent movies.
The text prompt aspect is interesting because, as well as being able to input ‘positive prompts’ that will guide the audio in the direction you want, you can also add ‘negative prompts’ which tell the AI to avoid certain things. This means that you can generate a potentially infinite number of different soundtracks for any one piece of video.
This clip was generated using the prompt "A drummer on a stage at a concert surrounded by flashing lights and a cheering crowd".
The system is also capable of creating audio using just video pixels, so no text prompts are required if you don’t want to use them.
Google DeepMind admits that V2A currently has some limitations - the quality of the audio is currently dependent on the quality of the video, and lip synchronisation when generating speech isn’t perfect - but says that it’s doing further research in a bid to address these.
Find out more and check out further examples on the Google DeepMind website