Vidu is an AI video platform out of China hoping to take on not only other leading players such as Runway and Kling but also compete with OpenAI’s powerful, yet-to-be-released Sora.
Developed by Shengshu, this is the first AI video tool to add 'multi-entity consistency'. This feature allows you to splice together unrelated images to create a new, single cohesive video. This comes after a recent study found AI video models mimic physics from images, rather than understanding how it works.
For example, you could upload a photo of yourself and a random car and the model could place you behind the wheel and make the car move. Another example, given by Vidu, is to add different clothing to a character by using a second image of a coat or shirt.
What I like most about Vidu 1.5 is the degree of control it gives me as a creator when putting together my AI video. I can customize motion degree, resolution, duration and more. I need to do more testing but it is likely going to end up on my list of best AI video generators.
What can you do with Vidu 1.5?
Vidu 1.5 is the latest model from Shengshu and as well as the multi-entity mode, it has the usual text-to-video and image-to-video modes that other platforms enjoy. You can set a video to generate as photorealistic or an illustration and the motion isn’t bad.
Being able to generate clips in 1080p is also a big step up from the usual 720p limit of other platforms, although its text-to-video model isn’t as good as Runway, Kling or MiniMax.
“The future of content creation is here, and it is powered by the limitless possibilities of AI,” said Jiayu Tang, CEO and co-founder of Shengshu Technology. "At the core of this transformation lies the ability for anyone to engage in high-quality content production, unlocking new opportunities and breaking down traditional limitations."
Multiple-entity consistency is probably one of the most innovative add-ons to AI videos I’ve seen in a while. I tried it out and not only does it allow you to steer the visuals of a video, but it can improve the overall motion, especially if you use it to give different perspectives.
In one example I gave it three images of a skateboarder and added the extra perspectives to help generate a more fluid motion as the board moved across the steps.
In another test, I was able to give it a photograph of me and a picture of someone busking and it was able to create a fairly accurate facsimile of me playing guitar — from a single image!
Final thoughts
Part of what made the video of me playing guitar work was another feature called ‘Advanced Character Control’. According to Vidu this offers more precision over the way a camera moves, cinematic techniques used in the output and general motion in the video.
Finally, you can set the level of motion speed. This allows the model to be more authentic in the output, according to Vidu. Basically you can set it to auto, low, medium and high motion and create a more dynamic output.
Overall I’m impressed with Vidu 1.5. It has some work to do to catch up with the cutting edge in terms of visual realism and motion, but it is very close and it is a state-of-the-art model.
The multi-entity consistency is such a significant feature that it is enough on its own to draw attention to Vidu and I suspect something other models will attempt to mimic in the near future.