The generative AI wars are building to a crescendo as more and more companies release their own models. Generative video seems to be the biggest current battleground and Genmo is taking a different approach.
The company is releasing its Mochi-1 model as a 'research preview', but the new video generation model falls under an Apache 2.0 license which makes it open source and able to be taken apart and put back together again.
That also means Mochi-1 is free to use, and you can try it for yourself over on Genmo's site. The beauty of it being open-source also means it will be available on all the usual generative AI platforms in the future, and one day could run on a good gaming PC.
It is launching into a very competitive market with different services offering a range of capabilities including templates from Haiper, realism from Kling or Hailuo and fun effects from Pika Labs and Dream Machine. Genmo says its focus is bringing state-of-the-art to open-source.
Genmo releases free AI video model
So, why use Genmo's model over any others on offer right now? It all comes down to motion. We spoke to Genmo's CEO Paras Jain, who explained that motion is a key metric when benchmarking models.
"I think fundamentally for a very long time, the only uninteresting video is one which doesn't move. And I felt like a lot of AI video kind of suffered this 'Live Photo effect'", he explains. "I think our historical models had this, that was how the technology had to evolve. But videos about motion, were the most important thing we invested in, above all else."
This initial release is a surprisingly small 10 billion parameter transformer diffusion model that uses a new asynchronous approach to pack more punch into a small package.
Jain said they exclusively trained Mochi-1 on video, rather than the more traditional mixed video, image and text approach. This gave it a better understanding of physics.
The team then worked on ensuring the model could properly understand what people wanted it to make. He told us: "We've invested really, really heavily in prompt adherence as well, just following what you say."
Genmo hopes Mochi-1 can offer 'best-in-class' open-source video generation, but at present, videos are limited to 480p as part of the new research preview launching today.
As Jain mentions, a big focus has been placed on prompt adherence and recognition, too. Genmo benchmarks this with a vision language model as a judge following Open AI's DALL-E 3.
Will you be testing Mochi-1? Let us know. It's certainly entering a crowded landscape, but its open-source nature could see it extend further than some of its rivals.
It isn't even the only open-source AI video model to launch this week. AI company Rhymes dropped Allegro "a small and efficient open-source text-to-video model". It is also available with an Apache license although its 15 frames per second and 720p, rather than the 24 frames per second and 420p of Mochi-1.
Neither model will run on your laptop yet, but as Jain told us, the beauty of open-source is that one day someone will fine-tune it to run on lower powered hardware and we'll be making videos offline.