Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Guide
Tom’s Guide
Technology
Ryan Morrison

ChatGPT-4o vs Google Gemini Live — how the new AI assistants stack up

Google Gemini vs GPT 4o.

Google launched a new artificial intelligence product at its Google I/O event on Tuesday — Gemini Live. We all assumed that is what the Gemini Assistant in Android was supposed to do but this is Google and anything goes.

If it wasn’t for the fact it comes just one day after OpenAI’s first consumer product event, I’d ponder over whether Gemini Live was launched to take on ChatGPT Voice. Both are built using native multi-modal AI models and have impressive voice and video capabilities.

Currently in the global AI race the front runners seem to be OpenAI and Google, with the former seemingly cozying up to Apple and the iPhone and the latter in control of Android. Forget AI devices like the Rabbit r1 or the Humane Pin — the short-term winner is the smartphone.

Both ChatGPT Voice and Gemini Live are being integrated into an existing AI product and neither is available today — but how else do these next-generation assistants compare?

How do Gemini Live and ChatGPT 4o compare?

Google is on the back foot a little when it comes to credibility, especially around showing off live video analysis and voice capabilities. When it announced Gemini Ultra last year it did so with a video of it responding to real-time video — only it wasn’t real-time or video.

However, this time they made a point of making the tech, at least the underlying “Project Astra” aspect of it including speech and video conversation available to try out at I/O.

Both offer a conversational, natural language voice interface, both offer the potential for live video analysis through a smartphone camera and both seem to be fast enough for a truly natural conversation where you can interrupt the AI mid-flow.

However, there are some notable differences. OpenAI’s ChatGPT Voice sounds more natural, can detect and respond to emotion and vocal tones and even adapt in real-time to how you ask it to speak. I didn’t see evidence of that capability from Gemini Live.

The other big difference is around multimodality. Gemini still relies on other models for output including using Imagen 3 for images and Veo for video. GPT-4o is natively multimodal in both directions — the o stands for omni, or in all directions. It creates its own images and sound.

Gemini Live vs GPT-4o: The future of voice assistants

(Image credit: Google)

The world seems to be moving towards voice and away from text input. When I first watched the OpenAI announcement my reaction was that this is a paradigm shift in human-computer interface, one as big as the launch of the mouse or the touch screen.

I still hold that view and the fact Google is also launching a native, natural-sounding voice interface further cements that. Even Meta has its MetaAI, a voice bot available in its VR headsets and the Ray-Ban smart glasses.

While the smartphone might be the winner for now, its clear the real form factor for these voice AI models is smart glasses. Available with cameras at eye height and arms to send soundwaves into your ears — they are the perfect AI device.

The question is whether OpenAI moves into hardware, launching its own pair of smart glasses or whether this is the new Siri and will power a future Apple Glasses product. Also, whether Google is really brave enough to resurrect Google Glass.

More from Tom's Guide

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.