Creating an image using artificial intelligence is easier than ever. When you use a chatbot it's simpler still, as the language model takes all the guesswork out of prompting for your picture.
Grok is a relative newcomer to the chat platform space. Built into X, it is now freely available, and rumor suggests it will be moving out on its own at some point next year with a dedicated URL. This will put it in more direct competition with Gemini, ChatGPT, Claude, and MetaAI.
The xAI team has also given Grok its own custom AI image creation model. It was previously using Flux to create pictures but has now shifted to Aurora, although Elon Musk says we shouldn’t use that name and instead just think of Grok making its own pictures.
Gemini has also recently undergone a major overhaul with Gemini 2.0 Flash joining the models available for Gemini Advanced subscribers. However, at least for now, it still uses the underlying Imagen 3 model to create pictures. This will change as Gemini 2.0 has native image abilities.
Both Grok and Gemini are particularly good at the task of generating images, either in crafting prompts for another model or refining one you’ve already written. So I put them head to head.
Creating prompts for the test
Creating prompts to test two chatbots in their ability to generate images is slightly different to writing prompts for Midjourney or Ideogram. The focus is on keeping it simple and using top-level concepts with some description, as the AI will fill in the gaps.
You also need to use trigger words and phrases such as “imagine”, “paint” or “craft” to let the model know you want a picture, not a story or text response. I want photos rather than drawings so will use that as a keyword.
Gemini will only output images in a 1:1 resolution and so far, Grok seems to favor 4:3. Unless otherwise indicated all the images are the first response with no follow-up refinement. They were all also requested within the same session rather than creating a new chat for each prompt.
1. Modern Urban Wildlife
Prompt: “Generate a photograph-style image of a red fox navigating a rainy city crosswalk at dawn, while pedestrians with umbrellas wait at the signal.”
This first prompt is designed to test how well they depict animals as well as capture the right lighting and background elements. The ideal output would look like a stylized photograph with rain effects but also maintaining as realistic view as possible.
While the Gemini image is more striking, I think Grok gets closer to what I had in my mind. The fox is much more realistic than in the Gemini image.
- Winner: Grok
2. Kitchen in Action
Prompt: “Generate a photograph-style image of a professional chef's kitchen during the dinner rush, with steam rising from pots and flames visible from the grill station.”
This is designed to show how well they can accurately display kitchen equipment, follow the prompt and handle elements like heat and moisture. It should show a commercial kitchen and behavior, also demonstrating the idea of activity.
Grok wins this one easily as Gemini failed to understand the context of the prompt, that we would expect a chef to be in the kitchen.
- Winner: Grok
3. Construction Site Progress
Prompt: “Generate a picture in a documentary photography style of a mid-rise building under construction, with workers installing glass panels while cranes operate overhead on a clear afternoon.”
This prompt aims to see how well it can generate perspective, as it needs to show height and positioning. It also needs to show material properties and be as realistic as possible. I went for the documentary style as it also adds additional complexity.
Gemini's image looks so much more realistic than Grok, where it fails to include any of the workmen and only shows a broad view.
- Winner: Gemini
4. Farmers Market Morning
Prompt: “Create an image in a smartphone photography style of a busy farmers market at 7am, with vendors setting up stands while early customers inspect fresh produce.”
With this comparison, the models should show the time of day (getting lighting right) as well as product freshness and human interaction. I'm looking for shadow lengths and activity levels.
This was the hardest call for me. I preferred the natural look of the Gemini image but I think Grok more accurately captured the lighting and time of day.
- Winner: Grok
5. Auto Repair Diagnostic
Prompt: “Create a black and white, retro-style photograph of a mechanic using a diagnostic tool on a modern car, with the hood up and engine bay visible.”
I wanted to see how well both models handled black-and-white photography. In this they also had to show tool use, lighting and engine detail.
Again, this was a close call between the two images but I've given it to Gemini as it more accurately displayed engine details.
- Winner: Gemini
6. Emergency Response
Prompt: “Make me an action photograph of paramedics treating a patient on a neighborhood street while police direct traffic around the scene.”
Action photography is a challenge. I did it for a while as a journalist earlier in my career (not very well). We need to show correct positioning, public safety measures within the image and a sense of urgency.
Gemini matched the prompt much more closely and created a more realistic-looking image. This was an easy decision.
- Winner: Gemini
7. Violin Performance Practice
Prompt: “Create a photo-style image of a violinist practicing alone in a room at sunset, sheet music visible on the stand.”
Finally something more artistic. Here we want to see hand positioning for the violin, natural lighting effects and the quality of the sheet music.
One of these looks like the cover of a classical album, the other like a photograph of someone practicing violin. As the prompt asks for someone practicing I've given the win to Grok.
- Winner: Grok
Winner: Gemini vs Grok
Grok is very impressive. Not only as a chatbot but also in its ability to generate realistic images. That doesn't take away from Imagen 3 which is in itself very impressive, but it has a habit of being too stylized.
It was a close match-up. Both models are fairly evenly matched but Grok is better at interpreting a prompt and creates more natural-looking images.
What is worth noting is that soon Google will be launching a new version of Gemini that can create images natively. That means it won't have to use Imagen 3 to create the pictures, it can do it on its own.