Google has finally launched its Gemini Ultra 1.0 artificial intelligence model, which the company claims can go head-to-head with OpenAI’s most powerful AI model GPT-4-Turbo on most tasks. If true it would be the first real competition to OpenAI’s ChatGPT since the start of the GenAI boom.
To make things confusing Google has renamed its chatbot from Bard to Gemini, so now the underlying model and the service you use to access it have the same moniker.
I’ve been waiting to put Gemini Ultra to the test since it was first announced in December, promising true multi-modal support across images, video, text, voice and code. It is only available with a $19.99/month subscription to the new Google One AI Premium plan — so I subscribed to put it to the test.
This isn’t an exhaustive review or comparison to ChatGPT — they will follow in time — but more a snapshot of my experience playing with it in the first 24 hours since its launch.
First thing I tried — coding
The first thing I try with any new chatbot is its coding capabilities. This is something that should come naturally to AI and one of the first commercial use cases for generative AI even before ChatGPT.
Google made a big deal out of Gemini Ultra’s coding capabilities so to see how well that translates into usable coding help within the Gemini Advanced chatbot I asked it to create a website.
I had an image I’d generated of an anime style pixel-art cat sitting on a fence. I gave Gemini this picture and the concept of the cat being the star of a new show about a cat spy called Cat-astrophe. I then asked it to create a CSS style sheet that utilizes the colors of the image.
It didn’t do a great job, just selecting a small selection of the colors, but it was able to create a functional stylesheet. I then asked it to create a website using that stylesheet for my fictional cat show — this it did surprisingly well. All I had to do was include the correct path to the image.
Making images inside Gemini
Image generation isn’t something Gemini Advanced does itself. This is handled by a separate Imagen 2 model created by Google DeepMind. However, its improved reasoning and understanding should allow for an improved collaboration between myself and the chatbot — and a better image.
I started by asking it to create an image of my favorite test subject "Cat on the moon" and add the words "Cat on the Moon" to the bottom of the picture. It gave me two images doing exactly as I asked.
However, that is something I could do with Bard already, so I tried to confuse Gemini. I asked it to: “Put the cat in a spacesuit and put it on the surface of the Moon with the Earth in the background. But keep the same visual style as the one you've already generated.”
I did what I asked, as I asked, but didn’t add the text back on. When I asked it to add the text to the bottom left corner it created a new image and put it in the top right — not a win but other AI platforms also struggle with consistency over image generation and at least the text was legible.
An infinite number of bots writing Shakespeare
One area AI is starting to excel is in poetry, specifically adapting or reflecting the style of existing creatives. So I asked Gemini Advanced to write a Shakespearean sonnet about the console wars. It did ok in the first run-through but didn’t mention any console.
I then decided to ask it to be more specific and it created an impressive piece of poetry. The problem is, this isn’t exactly challenging for an AI so I asked it to extract the most commonly used words in the sonnet and use it to construct an interesting sentence that only features those words.
With this test I was hoping it would realize the most common words would be things like “of”, “is” and “or” and just make something fun. It didn’t, and I had to keep pushing just to get it to give me the common words without the message. In contrast ChatGPT got my request on the first attempt.
The poem written by Gemini Advanced:
Three banners fly where digital wars rage,
A battleground where loyal players stand,
Xbox and PlayStation, legends on the stage,
With Nintendo's charm, a magic in its hand.
From bygone days when rivals clashed and burned,
Where hedgehog swift and plumber bravely fought,
The embers of those battles have returned,
Though names have changed, the passion is untaught.
Green, blue, and red, their colors mark the lines,
Each champion boasts of power yet unseen,
With worlds immersive, narratives that shine,
The choice divides where gamers may convene.
Yet 'neath the fray, a shared love does abide,
For in the realm of play, all hearts collide.
Conclusion — good but needs some work
I was generally impressed with how response Bard... sorry... Gemini Advanced was compared to previous iterations. It gave reasoned and structured replies to even complex queries but it also failed to respond more than I'd have expected.
It feels like its waiting for some fine-tuning and additional features to be turned on, but when it does get it right its easily on par with using ChatGPT Plus.
What remains to be seen is just how much better it gets as Google improves the bot in response to user feedback, real-world use cases, and through fine-tuning.