Google's Imagen 3 has finally arrived in Gemini and is already making waves with its ability to create stunning visuals based on simple prompts. Google boasts that it’s their “highest quality image generation model yet,” and I couldn’t wait to explore its full potential.
AI-driven image generators dominate the AI landscape, with models like DALL-E 3 in ChatGPT and Midjourney getting most of the attention. Gemini previously had access to Imagen 2 but this was removed after some issues with performance.
The new model, built by Google's AI lab DeepMind, offers a fresh take on the process with its sophisticated approach to visual creativity.
My goal was to see how well it handles diverse prompts, from the texture of sushi to the intricate features of a human face. I was impressed by its realistic images but also encountered some quirks that reminded me this is still a developing technology. Here’s what stood out.
Creating the prompts
Using Google Gemini’s Imagen 3 is straightforward. The interface is intuitive, and I like that it allows for prompt adjustments or re-generations so if the initial result isn’t perfect, it’s easy to iterate without starting from scratch. This balance between speed and flexibility encourages creativity and exploration.
I decided to experiment with a mix of both detailed and open-ended prompts. I figured that this was the best way to test Gemini’s creativity while determining if it responded better with vague or more explicit prompts. Interestingly, I was impressed by the visuals in some cases while noticing an obvious lack of nuance at other times.
1. Plate of sushi
Prompt: “Create an image of a plate of sushi.”
I started with something simple — a plate of sushi. Gemini created a beautifully detailed image, with vibrant colors and textures that made the sushi look appetizing.
I was incredibly impressed by the detail and how the sushi looked as if it was taken from a print in a magazine. However, it still lacked the creativity I was hoping for because the image seemed quite generic. Yes, the realism was impressive, but it didn’t push boundaries or showcase much variety in artistic style like I had hoped.
2. Cozy living room
Prompt: “Create an image of a cozy living room.”
I decided to be a little vaguer with this one to see what Gemini would create. The result was hilariously bad. The classic furniture, harsh lighting, bland color schemes, and oh, yeah, the chandelier did not set the mood I was hoping to achieve.
When I think of “cozy” I want a big comfy couch, big windows, soft lighting and a warm blanket. This image was completely off, but was still impressed by the details, placement of the objects, and overall aesthetic of the room — despite it not being the room I wanted.
3. Majestic tiger in the wild
Prompt: “Create an image of a majestic tiger in the wild.”
For an animal image, I requested a majestic tiger in the wild. The AI delivered with the fur details but from what I could tell, there was not natural setting. It simply looked like a tiger taking a school picture.
Don’t get me wrong, it’s a visually striking image. However, the tiger’s face had a slightly unnatural look, which distracted from the otherwise impressive composition. Where the AI excels at in textures, it lacks in emotional expression.
4. Retro-style movie poster
Prompt: “Create a retro-style movie poster”
In my opinion, this image is where Imagen really shined. It nailed the assignment with a truly eye-catching design. It’s so good that I would wear this design if it were on a t-shirt.
The bold font with vibrant colors that faded into the edges created an authentic retro vibe. The creativity here was simple, yet it seemed as though Gemini grasps the nuances of poster design. So far, this was the most satisfying of all the images I generated.
5. New York City
Prompt: “Create an image of NYC”
Lastly, I wanted to see how it would do with architecture, so I went with NYC. Gemini generated a hyper-realistic image that could have been a photograph.
The image was a technically incredible visual of the skyline, but the sky itself looked almost too perfect, lacking the imperfections that might make it feel less AI-generated. While the image was impressive, it didn’t quite cross the line into believable realism.
Final thoughts
Overall, Gemini’s Imagen impressed me. The detail, texture, and design aesthetics were pretty amazing. However, it’s clear that it needs extra prompting to truly nail down the final visual. It struggles with more nuanced requests and can’t seem to creating unique, artistic interpretations.
I like that Imagen is free, but keep in mind that the basic version cannot yet create humans, portraits, faces. If you’re looking for that feature, you’ll have to go with Gemini Advanced, which you can try for one month at no charge. As a bonus, unlike ChatGPT, the images are saved as jpg.
Imagen shows a lot of promise, and with continued updates, it could become a powerful creative tool. However, for now, some aspects feel a bit robotic and slightly off, leaving room for improvement.