Google has an advanced AI lab in the form of DeepMind, and it has been cooking over the past few weeks. The latest release is a new version of the Veo artificial intelligence video model that has the most accurate understanding of physics I've seen from any video tool so far.
First announced at Google I/O earlier this year, Veo is a direct competitor to OpenAI's Sora to be one of the best AI video generators, and the new version takes things up to an entirely new level.
Veo 2 brings with it improvements in visual realism as well as a better understanding of physics, ensuring movement is more accurately depicted. One example video shows someone accurately slicing a tomato, something no other video model can achieve — including Sora.
Best Veo 2 AI videos I've found so far
The new Veo model is currently still in the waitlist phase, but you can sign up to get access when it becomes available through Google Labs. Built into the VideoFX experiment, it will let you create 4k clips of up to a minute long.
I haven't tried Veo 2 myself but the videos shared by Google — including one showing bees surrounding a beekeeper — appear more real than anything I've tried. Even Pika 2.0, one of the best so far, doesn't solve the physics issue.
While I wait to get access I looked through social media and the Veo 2 website to gather some of the best examples of its capabilities I could find.
I picked the above video because of the way it handles the complex interplay between the individual bees and the beekeeper. The bees look and move naturally and the beekeeper picks up a jar of honey. This might seem trivial but each of those elements are things other models struggle with alone.
Prompt from Google: "The camera floats gently through rows of pastel-painted wooden beehives, buzzing honeybees gliding in and out of frame. The motion settles on the refined farmer standing at the center, his pristine white beekeeping suit gleaming in the golden afternoon light. He lifts a jar of honey, tilting it slightly to catch the light. Behind him, tall sunflowers sway rhythmically in the breeze, their petals glowing in the warm sunlight. The camera tilts upward to reveal a retro farmhouse with mint-green shutters, its walls dappled with shadows from swaying trees. Shot with a 35mm lens on Kodak Portra 400 film, the golden light creates rich textures on the farmer’s gloves, marmalade jar, and weathered wood of the beehives."
A few years ago, when OpenAI first unveiled the DALL-E 3 image model they used flamingos. I don't know if this was deliberate from Google but there are more than one Flamingo videos in the examples. Here they capture the movement of the water, the physics involved in the weight of the dog and lighting.
Prompt from Google: "A cinematic shot captures a fluffy Cockapoo, perched atop a vibrant pink flamingo float, in a sun-drenched Los Angeles swimming pool. The crystal-clear water sparkles under the bright California sun, reflecting the playful scene. The Cockapoo's fur, a soft blend of white and apricot, is highlighted by the golden sunlight, its floppy ears gently swaying in the breeze. Its happy expression and wagging tail convey pure joy and summer bliss. The vibrant pink flamingo adds a whimsical touch, creating a picture-perfect image of carefree fun in the LA sunshine."
This prompt just made me hungry. It led to me making a coffee. Weirdly, pouring liquid is something other models struggle with but Veo 2 did it perfectly.
Prompt from Google: "The sun rises slowly behind a perfectly plated breakfast scene. Thick, golden maple syrup pours in slow motion over a stack of fluffy pancakes, each one releasing a soft, warm steam cloud. A close-up of crispy bacon sizzles, sending tiny embers of golden grease into the air. Coffee pours in smooth, swirling motion into a crystal-clear cup, filling it with deep brown layers of crema. Scene ends with a camera swoop into a fresh-cut orange, revealing its bright, juicy segments in stunning macro detail."
Video models have got a lot better at depicting emotion, but they're not perfect and some are better than others. This video shows Veo 2 is one of the good ones.
Prompt from Google: "An extreme close-up shot focuses on the face of a female DJ, her beautiful, voluminous black curly hair framing her features as she becomes completely absorbed in the music. Her eyes are closed, lost in the rhythm, and a slight smile plays on her lips. The camera captures the subtle movements of her head as she nods and sways to the beat, her body instinctively responding to the music pulsating through her headphones and out into the crowd. The shallow depth of field blurs the background. She’s surrounded by vibrant neon colors. The close-up emphasizes her captivating presence and the power of music to transport and transcend."
Finally, this video just captivated me for its complexity. There are so many elements happening within the clip and it largely holds visual clarity and motion. The reflection, the motion happening int he mirror — even reflecting the candle are all elements other may have struggled with.
Prompt from Google: "The camera moves in a slow dolly shot, revealing the opulence of a Renaissance palace chamber adorned with gold-inlaid furniture, velvet drapes, and chandeliers casting soft, flickering light. A queen sits motionless at a gilded desk, her crimson silk gown cascading onto the floor like spilled blood. On the desk lies an unsigned letter, its edges curled with age. The camera frames her from behind, catching the reflection of her stoic face in a massive, ornate mirror. In the background, courtiers murmur, their silhouettes dancing like ghosts in the candlelight. The room feels heavy, every gilded detail amplifying an air of betrayal and paranoia. The color palette alternates between deep, regal reds and cold golds, with chiaroscuro lighting intensifying the drama. Shot on 70mm film for rich texture, evoking the grandeur of historical masterpieces."