
OpenAI wants its next generation of AI models to be a lot more upfront about their mistakes. With ChatGPT wrong about 25% of the time, this feature seems long overdue. But the company isn't training them to be more self-aware; it's training them to report errors directly.
This week, OpenAI published new research on a technique it's calling “confessions” — a method that adds a second output channel to a model, where it’s specifically trained to describe whether it followed the rules, where it may have fallen short or hallucinated and what uncertainties it faced during the task.
Here's the thing, though. It’s not a ChatGPT feature that's available yet to users; instead, it's a proof-of-concept safety tool designed to help researchers detect subtle failures that are otherwise hard to see. And according to early results highlighted in the study, it may actually work.
What “confessions” really are

Confessions are not the AI equivalent of a guilty conscience. They’re a trained behavior, created by giving the model a second task. The model starts by producing an answer, as usual. But then it produces a "ConfessionReport" evaluating the following:
- Accuracy of following each instruction
- Mentioning any shortcuts taken or if it “reward-hacked” the task
- Highlighting hallucinated details or unjustified assumptions
- Showing any encountered ambiguity or uncertainty of how to comply
Crucially, the confession is judged only on whether it honestly describes what happened and not whether it makes the model “look good.”
That means a model is rewarded for admitting a mistake, and not punished for exposing flaws in its own output. This reward structure is what makes the approach novel: it separates performance from honesty.
Anyone who has used ChatGPT or any other chatbot knows that one of the biggest problems with AI is that the model’s output can look perfectly fine while hiding a failure underneath. For example, the model may:
- Invent a fact
- Break a rule
- Overlook a key constraint
- Optimize for an unintended pattern
- Or rely on a faulty shortcut

These failures often go undetected because the answer itself doesn’t reveal them. And, most users don't notice because the model seems so confident in its answer.
OpenAI built a set of “stress tests” specifically designed to provoke these kinds of hidden errors, including hallucination traps, ambiguous instructions, and tasks where the model’s incentive is misaligned with correctness.
As stated on OpenAI's site, when confessions were added, the model surfaced far more cases where it had deviated from the instructions. According to the paper, the new method reduced undetected misbehavior to about 4.4% on average within those controlled test environments.
But what ChatGPT confessions still can't do is make AI models more truthful or reliable by default. In other words, they don’t eliminate hallucinations, reduce bias or prevent rule-breaking. Instead, they create a structured way for researchers to detect when those issues occur.
Bottom line
OpenAI’s “confessions” method doesn't mean your next prompt response will be any more accurate. It’s a research technique designed to make models better at reporting when they don’t follow instructions — not better at following them. And, at this time, it's only part of internal research.
The early results are promising, but they apply to controlled tests, not real-world conversations. Still, confessions could become an important part of how AI systems are evaluated as they get more capable, hopefully offering a new way to expose mistakes that ordinary outputs don’t reveal.
If this work continues to pay off, the next generation of AI assistants might tell you when they got something wrong. But don't hold your breath waiting for these models to be honest or accurate in the first place.
More from Tom's Guide
- Elon Musk's AI vs. Google's AI with 9 challenging prompts — here's the clear winner
- 11 underrated AI features that can save you serious time — and most are free
- I tested ChatGPT vs Gemini with 7 money-saving prompts — here’s the one that actually saved me more

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.