OpenAI has unveiled its new o1 model, which, while taking a bit longer to respond to queries, is considerably more likely to be accurate and provide significantly more detailed responses than previous models.
Formerly known as project Strawberry or Q*, this is a reasoning model that takes a prompt and thoughtfully works through how to solve it step-by-step, rather than generating a response token by token.
While not perfect for every task, it excels at math, coding, and problems that demand extended thought and analysis. For instance, it can analyze timesheets and shift data for a large store to devise an optimal working pattern.
What is ChatGPT o1?
here is o1, a series of our most capable and aligned models yet:https://t.co/yzZGNN8HvDo1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. pic.twitter.com/Qs1HoSDOz1September 12, 2024
Currently, the new model is offered in two versions: o1-preview and o1-mini. Somewhat confusingly, it seems that o1-mini is the more powerful model, but with a smaller knowledge base. Reports indicate that o1-preview was trained on an earlier architecture than mini, and the full o1 is deemed too powerful to release without additional security protections and guardrails.
This new model will be especially beneficial to researchers and students, as it has demonstrated PhD-level capability in math, mathematics, and other science, technology, and engineering subjects. I've devised a number of prompts to truly test its limits, but with only 30 messages per week, I've had to find ways to maximize each one. That said, OpenAI reset the rate limit to give Plus and Teams users more time to play with the model. It isn't available for free users of ChatGPT.
Tips for prompting ChatGPT o1
With a new type of model come new approaches to prompting. o1 processes a query by working through the problem and thinking about it until it reaches a solution. Therefore, your best strategy is to be as descriptive as possible, outlining every aspect of what you want to achieve, and then letting the AI handle it.
One of my top tips is to use another AI model like GPT-4o or Sonnet 3.5 to refine your basic idea into a workable prompt for o1. This could involve having it outline each step the model needs to take or breaking down the problem into smaller components.
In addition to improved performance and accuracy, o1 also boasts a significantly larger output window. This means it's more capable of generating a full report, writing an entire codebase, or providing a detailed response to a complex query compared to other OpenAI models.
1. A plan to terraform Mars
One of the most impressive things I found when trying o1 was its ability to outline its responses and offer detailed explanations of why it responded the way it did. Here was a prime example of that where it broke down the response section-by-section and gave an explanation.
The prompt: “Develop a comprehensive plan to terraform Mars, addressing major challenges such as radiation protection, atmosphere generation, and sustainable resource management. Include estimated timelines and potential technological breakthroughs required.”
You can view the full Mars Terraform report in a Google Doc.
2. A new form of math
My next experiment was a simple prompt holding a complex problem. I wanted a new form of math that didn’t require numbers. But it still had to be functional and the AI had to explain how we could make use of this new math with potential applications.
The prompt: “Design an alternative system of mathematics not based on our current numerical system or logic. Explain its fundamental principles, operations, and potential applications.”
You can read the full detail of "Qualitative Mathematics" in a Google Doc.
3. A new system of local government
After two fairly simple prompts, I went more descriptive with the third test. Here I asked it to come up with a new system of government that solves the problems of our current models.
The prompt: “Design a new system of government that addresses the major shortcomings of current democratic, autocratic, and other existing systems. Your proposal should consider:
Decision-making processes and power structures
Representation and participation of citizens
Checks and balances to prevent abuse of power
Economic model and resource allocation
Approach to law-making and enforcement
Handling of individual rights and collective responsibilities
Methods for adapting to long-term challenges and crises
Integration of technology in governance
Scalability from local to global levels
Evaluate the potential strengths and weaknesses of your proposed system, and discuss how it might be implemented or transitioned to from current forms of government.”
You can see o1's full explanation of "Dynamic Participatory Governance (DPG)" in a Google Doc.
4. A Mars-based resource management game
Code is where o1 really shines. Its ability to generate longer outputs, as well as more reasoned and accurate responses allows it to be more thorough in its code generation. What better test than a Mars colony game? Here it has to create resource management functionality, a UI and a fun gameplay element, all from a single prompt.
The prompt for this is fairly long and comprehensive, so for brevity I’ll include the first line and a summary: “Create a 2D version of Age of Empires set on Mars using Python and Pygame.” It goes on to say “The game should include the following elements and specifications,” including game window size, color schemes, buildings and gameplay mechanics.
5. An emoji-to-English dictionary
Finally, this idea came about after multiple attempts to give it reasoning problems other models couldn’t solve — but the other models kept solving them. I wanted it to come up with a new language, but that seemed a bit generic, so I had it turn emoji into a formal language instead.
The prompt: “Assume a scenario where a group of people can only communicate using emoji. It is how they communicate with one another. Using only widely available emoji create an emoji to English dictionary that would allow someone from that group to communicate with someone outside of the group that speaks English as we know it today. It has to be comprehensive enough to be both conversational and technical.”
You can check out the full Emoji Dictionary and phrase guide in a Google Doc.
Final thoughts
What I found when first using the two different o1 models is that the biggest issue was coming up with ideas to try. They essentially cause the AI to go away, have a think and come back with a more reasoned response. But they don’t have access to any of the features we’ve come to appreciate from modern AI including web access, memory and data analysis.
It is exceptionally good at coding, long-form conceptual work such as the emoji dictionary and problems that require reasoning. One example I saw on X was someone using it to create a work schedule by having it analyze available hours for different employees and required shifts.
When OpenAI adds the ability to load data files this will be game-changing in the business space and could be used to organize the family vacation, working out all the different complexities of the trip including timings and schedules.
Right now, with only 30 messages per week (I used half in a day), its a fun diversion but for most use cases GPT-4o is more than enough. In fact, GPT-04o mini is more than enough for how the vast majority of people use AI and Apple Intelligence is as good as that model.