Social media is buzzing with rumors of a big OpenAI announcement. This has been sparked by the success of Meta’s Llama 3 (with a bigger model coming in July) as well as a cryptic series of images shared by the AI lab showing the number 22.
As April 22 is OpenAI CEO Sam Altman’s birthday — he’s 39 — the rumor mill is postulating that the company will drop something big such as Sora or even the much anticipated GPT-5.
If it is the latter and we get a major new AI model it will be a significant moment in artificial intelligence as Altman has previously declared it will be “significantly better” than its predecessor and will take people by surprise.
I personally think it will more likely be something like GPT-4.5 or even a new update to DALL-E, OpenAI’s image generation model but here is everything we know about GPT-5 just in case.
What do we know about GPT-5?
We know very little about GPT-5 as OpenAI has remained largely tight lipped on the performance and functionality of its next generation model. We know it will be “materially better” as Altman made that declaration more than once during interviews.
Each new large language model from OpenAI is a significant improvement on the previous generation across reasoning, coding, knowledge and conversation. GPT-5 will be no different.
It has been in training since late last year and will either have significantly more than the 1.5 trillion parameters in GPT-4, or a similar number but stronger underlying architecture allowing for a major performance improvement without increasing the overall model size.
This is something we’ve seen from others such as Meta with Llama 3 70B, a model much smaller than the likes of GPT-3.5 but performing at a similar level in benchmarks.
Chat GPT-5 is very likely going to be multimodal, meaning it can take input from more than just text but to what extent is unclear. Google’s Gemini 1.5 models can understand text, image, video, speech, code, spatial information and even music. GPT-5 is likely to have similar capabilities.
What will GPT-5 be able to do?
One of the biggest changes we might see with GPT-5 over previous versions is a shift in focus from chatbot to agent. This would allow the AI model to assign tasks to sub-models or connect to different services and perform real-world actions on its own.
This is an area the whole industry is exploring and part of the magic behind the Rabbit r1 AI device. It allows a user to do more than just ask the AI a question, rather you’d could ask the AI to handle calls, book flights or create a spreadsheet from data it gathered elsewhere.
One potential use for agents is in managing everyday life tasks. You could give ChatGPT with GPT-5 your dietary requirements, access to your smart fridge camera and your grocery store account and it could automatically order refills without you having to be involved.
I think this is unlikely to happen this year but agents is certainly the direction of travel for the AI industry, especially as more smart devices and systems become connected.
How different will GPT-5 be?
One thing we might see with GPT-5, particularly in ChatGPT, is OpenAI following Google with Gemini and giving it internet access by default. This would remove the problem of data cutoff where it only has knowledge as up to date as its training ending date.
Expanded multimodality will also likely mean interacting with GPT-5 by voice, video or speech becomes default rather than an extra option. This would make it easier for OpenAI to turn ChatGPT into a smart assistant like Siri or Google Gemini.
Finally, I think the context window will be much larger than is currently the case. It is currently about 128,000 tokens — which is how much of the conversation it can store in its memory before it forgets what you said at the start of a chat.
We’re already seeing some models such as Gemini Pro 1.5 with a million plus context window and these larger context windows are essential for video analysis due to the increased data points from a video compared to simple text or a still image.
Bring out the robots
One of the biggest trends in generative AI this past year has been in providing a brain for humanoid robots, allowing them to perform tasks on their own without a developer having to programme every action and command before the robot can carry it out.
OpenAI has invested heavily in robotics startup Figure, using GPT-4 to power the Figure 01 and GPT-5 will likely have some spatial awareness data as part of its training to make this even more reliable and capable — understanding how humans interact with the world.
Nvidia is also working on AI models in this space that will be widely available, and AI startup AI21’s founder Professor Amnon Shashua has launched Mentee Robotics to create GenAI powered robots that could find their way into homes and workplaces as early as next year.
Google is also building generative AI powered robots that could use future versions of the Gemini models, especially with massive context windows and Meta is training Llama to understand spatial information for more competent AI-based AR devices like the smart glasses.
What this all means
The gap between open and closed source LLMs is narrowing! Inevitably, it will fully close and OSS will catch up by the end of the year! Even with GPT-5 in the arena! pic.twitter.com/JaQJucZNWfApril 13, 2024
Essentially we’re starting to get to a point — as Meta’s chief AI scientist Yann LeCun predicts — where our entire digital lives go through an AI filter. Agents and multimodality in GPT-5 mean these AI models can perform tasks on our behalf, and robots put AI in the real world.
OpenAI is facing increasing competition from open source models from companies like Mistral and Meta, as well as direct competitors like Anthropic with Claude and Google with Gemini. You then have Microsoft shifting away from its reliance on OpenAI — although I still think OpenAI will feature at Build 2024 in May.
Before we see GPT-5 I think OpenAI will release an intermediate version such as GPT-4.5 with more up to date training data, a larger context window and improved performance. GPT-3.5 was a significant step up from the base GPT-3 model and kickstarted ChatGPT.
Altman says they have a number of exciting models and products to release this year including Sora, possibly the AI voice product Voice Engine and some form of next-gen AI language model.