OpenAI says its latest AI model GPT-4o is faster and more advanced than its predecessor, in addition to being able to understand audio and video files natively. To find out just how well it compares — at least in terms of text — I put 5 prompts to both models inside ChatGPT.
When you open ChatGPT Plus you're currently given a choice of GPT-4o, branding the "newest and most advanced model," GPT-4 which is described as an "advanced model for complex tasks" and GPT-3.5, a model "great for everyday tasks".
Using GPT-4o it is instantly clear how much faster it is than the earlier models, including GPT-3.5, which is much smaller and less capable. It can also analyze video content, which is something not previously possible in ChatGPT or any mainstream chatbot.
Creating prompts to test GPT-4o
Recently Anthropic developed a powerful prompt builder tool. It takes your instructions and turns them into phrasing that will better instruct an artificial intelligence. I used this to help refine some ideas I was throwing around to test out the capabilities of GPT-4o.
Each prompt is designed to be one AI's normally stumble over, or fail to give a well-reasoned response to. Given that OpenAI promises faster AND better results from Omni over GPT-4, I thought this would be a good starting point.
1. This statement is false
First I asked both AIs to explain why the statement: "This statement is false" is neither true nor false. They are also expected to provide logical proof for the answers they provide.
The statement is a paradox that cannot be consistently assigned a truth value. Any attempt to do so leads to a logical contradiction. The challenge is to see whether they can identify the paradox and explain why they can’t assign a truth.
They both identified the fact it will give a true and false value, spotted the paradox and gave a breakdown of how they came to that conclusion. GPT-4o was more thorough and faster.
2. Where did the lights go?
Next is a fun test to see if GPT-4 and GPT-4o can understand relativity but explain it in simple terms. I asked them both: “If you're traveling in a car at the speed of light and you turn on the headlights, what happens? Justify your answer using principles of special relativity but explain it to a 5th grader.”
I expect the models to give a simple explanation, showing that the headlights will function normally and emit light relative to the car. Both models explained this concept and did so in a way that your average 5th grader would understand easily.
However, Omni does give off Steve Buscemi saying 'how do you do fellow kids' vibes: “So, even though you’re zooming along at the speed of light, when you turn on the headlights, the light beams still race ahead at their own speed. It's like light always has to win the race, no matter what. Cool, right?”
3. Finding the right number
Next, we create a simple math problem that has stumped AI models in the past, often resulting in very wrong answers. I posted: “The sum of two numbers is 10 and their product is 25. What is the difference between the two numbers? Explain each step in your solution.”
Both versions got it right, explaining that the two numbers are 5 and 5 and the difference between those two numbers is zero. The biggest difference was the degree of explanation in the working out. Omni went into a lot more detail and formatted the equations better.
4. A Haiku of conflicting ideas
I love Haiku and Anthropic seems to love it even more, naming its smallest AI model after the Japanese poetry style. “Write a haiku that simultaneously expresses the beauty of nature and the futility of human existence. The haiku should work on both literal and metaphorical levels.”
The rules of a Haiku state that it should be three lines with five syllables in the first and third lines and seven syllables in the second. I’m hoping for vivid natural imagery and a suggestion of the temporary nature of human life in comparison to the duration of nature.
GPT-4:
Autumn leaves whisper,
Mountains outlive fleeting breath—
Silent stone endures.
Omni:
Ephemeral bloom,
Whispers fade in timeless breeze—
Dust upon the dawn.
Both hit all the rules of Haiku but I think GPT-4o was more evocative and its use of a longer word to hit the 5 syllables in the first line was inspired.
5. Future time is future past
Finally a thought problem. I asked GPT-4o and GPT-4 to “Describe what it would be like to live in a world where the past, present and future all exist simultaneously. How would you experience time and causality in such a world?”
There is a Doctor Who episode where this happens and it is weird. I expect it to talk about the ability to traverse time with a single step and the impact of a non-linear causality where reaction precedes action and individuals can meet versions of themselves.
Omni talked about being in a world of constant flux, experiencing time and causality in a different and complex way. It suggested we'd get unparalleled insights into the nature of existence. GPT-4 said pretty much the same thing but added that living in such a world would offer a "profound expansion of experience and understanding."
Conclusion
I don’t think GPT-4o Omni is a significant step up in reasoning capabilities over GPT-4 but it is more descriptive, faster at responding and its big differentiator isn’t text but multimodality.
What we’re seeing now is improvements to speed and responsiveness in text, the ability to have it analyze video content and improved accuracy in understanding audio and images. Its true value will be in the voice and video responses.