Hailuo MiniMax launched earlier this year and very quickly became one of the best text-to-video artificial intelligence models on the market, offering realistic motion and high-quality video rendering — completely free.
I found the quality to be good, but its lack of an image-to-video model was a limiting factor in its usefulness. It also struggled with slow response times and while the motion was consistently good, its realism sometimes failed to live up to the hype.
The company is rapidly building on the model, including launching a new dedicated English-language website and community. The latest upgrade is to finally launch an image-to-video model which allows for more control over how the video looks.
I put it to the test with a series of prompts and here's how it went.
Putting Hailuo MiniMax to the test
To get the most out of the image-to-video model you need to start with a good image and so I turned to Flux 1.1 Pro from Black Forest Labs.
I came up with five fun prompts requiring various degrees of motion, then refined them with the help of ChatGPT to make them as descriptive as possible.
I then gave the resulting images to MiniMax along with either a custom motion prompt or just the image as the entire prompt.
1. The Astronaut on Mars
This prompt will test its ability to handle potentially complex motion under a less-than-normal physics environment — the lower gravity of Mars in a dust storm.
Image prompt: "A lone astronaut walking on Mars during a dust storm, captured in a dramatic cinematic style. The composition features the astronaut in the center of the frame, silhouetted against swirling clouds of red dust. The lighting is dim and diffused, with sunlight barely piercing through the storm. The color palette is dominated by warm, rusty hues of red and orange, giving the scene a hostile yet awe-inspiring atmosphere. The mood is both adventurous and foreboding, evoking a sense of isolation in an alien landscape. The shot is taken from a low angle, emphasizing the astronaut’s smallness against the vast Martian terrain, with subtle details like wind-battered rock formations in the background."
Motion prompt: "Astronaut running on Mars in a dust storm."
2. Having a conversation
A common test prompt I try with Runway and Kling is having someone talk. Here I generated an image of a woman talking and asked the AI to make it move.
Image prompt: "A young woman having an animated conversation, portrayed in a vibrant street photography style. The composition captures her at a three-quarter angle, with a shallow depth of field to focus on her facial expressions while blurring the bustling city behind her. Natural golden-hour lighting casts a warm glow on her face, highlighting her joyful expression. The color palette is a mix of warm yellows and soft blues, conveying a feeling of energy and life. The mood is lively and spontaneous, with a candid sense of storytelling. The use of a 50mm lens ensures a natural perspective, drawing the viewer into her conversation, while small details like background pedestrians and soft bokeh lights add to the urban atmosphere."
Motion prompt: "Having a conversation."
3. Dogs on the beach
One of the first 'good' AI images I ever saw was of dogs bounding along on a beach and one of the best Sora demo videos was of dogs playing. So I had Flux create an image of a dog in motion, then used Hailuo to make it really move.
Image prompt: "A joyful dog playing on the beach, captured in a whimsical, painterly style. The composition places the dog mid-action, leaping up to catch a thrown ball, with splashes of seawater frozen in the air. The lighting is bright and golden, indicating a late afternoon with the sun low on the horizon, casting long shadows. The color palette is full of warm sandy browns, azure blues of the sea, and golden highlights, enhancing the playful atmosphere. The mood is carefree and energetic, evoking happiness and freedom. The scene is shot from a slightly low perspective to highlight the dog’s enthusiasm, with technical detail focusing on motion blur to convey a sense of movement, and gentle waves in the background adding to the coastal setting."
Motion prompt: "smartphone camera, dog bouncing on the beach."
4. Drone display in London
While drone displays can be magical, they are limited in scope due to the cost and complexity of swarm motion, but can AI video do better? I've also given it no text prompt so it is all down to the image and the model.
Image prompt: "An incredible drone light display over London, rendered in a futuristic, neon-inspired style. The composition includes the illuminated drones forming intricate patterns in the sky above iconic landmarks like the Tower Bridge and the Shard. The lighting is entirely artificial, featuring bright, multi-coloured lights from the drones against the night sky, contrasted with the warm city lights below. The color palette includes vivid blues, purples, and greens, contributing to a futuristic, magical atmosphere. The mood is one of wonder and amazement, capturing the viewer's imagination. The image is shot from a high vantage point, looking slightly down over the cityscape, with technical details like long exposure used to create light trails, and reflections shimmering on the Thames."
5. Racing car on a mountain
Every AI video model I've tried struggles with vehicle motion. So let's see how well it handles a not-particularly-good image of a sports car racing along.
Image prompt: "A sleek racing car speeding along a winding mountain path, illustrated in a hyper-realistic style. The composition shows the car mid-turn, with motion blur in the background to emphasize its speed. The lighting is natural, with sunlight filtering through the trees and casting dappled shadows on the road. The color palette features the bright red of the car contrasting against the lush green of the surrounding forest, and the muted grey of the asphalt. The mood is intense and exhilarating, evoking the thrill of high-speed racing. The shot is taken from a dynamic side angle, almost level with the car, capturing the feeling of movement and agility. Technical details like sharp focus on the car and depth of field control highlight the precision and power of the scene, with the winding road disappearing into the mountains to add depth."
Motion prompt: "Fixed camera, car racing into the distance."
Final thoughts
Hailuo MiniMax was already impressive. I looked back over some of my previous text-to-video generations, as well as examples from others while waiting for these to complete — and it is very much top tier. Image-to-video takes it up a notch further.
One thing that really stood out is just how well it handles consistent motion across the six seconds of video it generates per prompt. I was surprised at how well the model handled the hand movements in the 'woman talking' test.
It isn't completely perfect. The ball disappeared and the dog seemed to change breed halfway through, and the astronaut does a jig at the start — but it is better than many of the AI video models I've tried.