Google hosts Google I/O 2024 Tuesday (May 14), and while there will be new updates across mobile, home and wearable devices during the annual developer event, AI will be center stage.
Google Gemini is the search giant's family of artificial intelligence models that's increasingly finding a space front-and-center in everything Google makes, from replacing Assistant on Android to powering analysis in search results.
What we’ll likely see at I/O is a new version of Gemini, further integration across yet more products and multimodal features coming to the Gemini chatbot, giving it the ability to take in speech, code, music and video for the first time. (You can find out what Google announces for yourself with our guide on how to stream the Google I/O keynote.)
Rumors suggest that we may also see Gemini adopt some of the more prominent features of rival OpenAI ChatGPT, including persistent memory across all conversations. But with OpenAI unveiling its new GPT-4o model with built-in voice assistant and vision features, Google will have to play catchup.
What to expect from the Gemini models
Google likes to confuse people — or at least that’s how it feels sometimes. The name Gemini applies to the underlying large language models, the Assistant replacement on Android, the chatbot and the AI auto complete in workspace.
To confuse things even further, there are three versions of Gemini. The first is Nano which runs on phones and small devices, Pro which runs in the cloud and powers the Assistant and the free version of the Gemini chatbot. Ultra is the most powerful model — at least, on paper — and it powers the $20/month Gemini Advanced.
Earlier this year, Google unveiled Gemini Pro 1.5. This was a big upgrade to the previous generation of Gemini as it added better understanding, music and video input and a massive million token context window — this is how much data it can store and reference from a single conversation.
Gemini Pro 1.5 is still only available to developers and researchers. While it doesn't have the reasoning of Gemini Ultra, in many ways it is more powerful.
At Google I/O, I suspect we will see some correction of this situation with 1.5 version upgrades to each of the free models in the family. They are also likely to be made available to the Gemini chatbot and Android Assistant.
New AI features at Google I/O
One more day until #GoogleIO! We’re feeling 🤩. See you tomorrow for the latest news about AI, Search and more. pic.twitter.com/QiS1G8GBf9May 13, 2024
Google has already teased a new version of Gemini, which leverages Google's voice assistant and video features to describe what's going on in your camera's view and provide assistance. We expect to hear a lot more about this feature.
Gemini can do a lot more than is currently possible through chat or voice interfaces. This includes taking in video and music content. I suspect both will be upgraded to add these new input options at I/O.
I think we will also see integration with other Google products and services, bringing more generative AI features to Photos, Docs and Slides. These will also be more tightly integrated into the Gemini Assistant and chatbot.
One of the more useful aspects of Gemini over ChatGPT is its deep integration with the Google ecosystem. Accessible via extensions, this includes access to search, flights maps, all of your documents and, of course, YouTube. Even YouTube Music is joining this extensions list — albeit only in the Android assistant version of Gemini.
While unlikely, one thing we might see is Google adding third-party providers to the extensions list. This would mirror functionality available in ChatGPT and Microsoft Copilot. If Google does integrate this, we could see companies like Uber and Kayak access Gemini. In Assistant, for example, you could plan a trip and manage all bookings from within chat, if this were to happen.
Google vs. the AI competition
The world is moving away from text and onto voice in terms of AI. This is being seen in the form of every AI lab working on synthetic voice solutions.
We’re also moving away from chat and onto agents where you instruct the AI to perform a series of tasks on your behalf rather than just having a friendly chat.
This is something we are already seeing from OpenAI. Apple is also said to be looking at this as an approach for Siri 2.0, which we expect to see at WWDC 2024 next month. And to some extent, Google is doing versions of this with the Gemini Assistant on Android.