ChatGPT misses ‘high-risk emergencies’ when it is used…

ChatGPT misses ‘high-risk emergencies’ when it is used as a doctor, study finds

ChatGPT’s health features miss “high-risk emergencies” and fail to spot when people need immediate care, according to a new study.

Health questions are one of the most common uses for artificial intelligence chatbots such as ChatGPT, according to its creators OpenAI. The popularity is such that earlier this year the company introduced a new tool – ChatGPT Health – aimed specifically at helping people with their wellbeing, and the company says that tens of millions of people are already using it.

But a new study suggests that the system could miss important emergencies and cannot be relied on to safely tell someone that they need urgent medical care.

“LLMs have become patients’ first stop for medical advice—but in 2026 they are least safe at the clinical extremes, where judgment separates missed emergencies from needless alarm,” said Isaac S Kohane, from Harvard Medical School, who was not involved with the research. “When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high. Independent evaluation should be routine, not optional.”

The urgent need to check whether the system was safe led to a fast-tracked study from the Icahn School of Medicine at Mount Sinai, which has been published in Nature Medicine.

The work emerged from a recognition that ChatGPT was being relied on for potentially life and death situations but that there is relatively limited research on whether it actually works. The gap between those two things led to the study, researchers said.

“We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?” said lead author and urologist Ashwin Ramaswamy. The researchers found that it did not, at least in enough cases to lead them to question its reliability.

Researchers found for instance that the systems alerts were “inverted”: the more at-risk someone was from harming themselves, the less likely an alert would be triggered. That finding was “particularly concerning and surprising”, they said.

In the research, doctors created 60 scenarios that covered 21 medical specialties. They ranged from relatively low-risk situations that could only need at-home care, to genuine medical emergencies, and researchers used 16 different contextual conditions such as race and gender.

The researchers found that the tool generally handled clear emergencies correctly, but was insufficiently concerned in more than half of cases where doctors decided the person would need emergency care. While it was good for “textbook emergencies”, it was less good at spotting situations where the danger might be less immediate or obvious, they said.

The work is reported in a paper, ‘ChatGPT Health performance in a structured test of triage recommendations’, that has been fast-tracked to publication in Nature Medicine.

Google’s Gemini sent a user to find a robot body before death, lawsuit says

Users boycott ChatGPT after OpenAI signs Department of War deal

Claude has just gone down

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here