Health & Wellness,

Study Finds ChatGPT Health Missed Over Half of Medical Emergencies in Triage Test

Published

March 4, 2026

A new peer-reviewed study has raised concerns about the reliability of OpenAI’s health-focused chatbot, finding it frequently underestimated the severity of serious medical emergencies.

The research, published last week in Nature Medicine, evaluated the triage performance of ChatGPT Health — a specialized version of ChatGPT designed for health-related questions. Researchers found that the system “under-triaged” more than half of emergency scenarios, advising delayed care when immediate treatment was necessary.

Underestimating Critical Cases

In the study, researchers presented the chatbot with 60 real-world medical cases, each with 16 demographic variations altering factors such as race and gender. The variations were constructed to ensure that the clinical urgency remained identical across versions.

Three physicians independently reviewed the same cases and assessed their urgency using established medical guidelines.

The results showed that ChatGPT Health underestimated 51.6% of cases that doctors classified as emergencies. Instead of directing patients to the emergency room, the chatbot often recommended seeking medical care within 24 to 48 hours.

Among the misclassified cases were life-threatening conditions such as diabetic ketoacidosis and impending respiratory failure — both of which require immediate intervention.

“Any clinician would recognize these as emergencies,” said lead study author Dr. Ashwin Ramaswamy of The Mount Sinai Hospital in New York. He noted that in some instances, the chatbot appeared to wait for symptoms to become “undeniable” before advising emergency care.

By contrast, classic stroke symptoms were correctly identified as emergencies in every instance tested.

Over-Triage of Minor Conditions

The researchers also found that ChatGPT Health frequently erred in the opposite direction.

In 64.8% of non-urgent scenarios, the chatbot advised seeking medical care when home treatment would have been sufficient. For example, it recommended a doctor’s visit within 24 to 48 hours for a sore throat lasting three days — a situation typically managed with rest and over-the-counter remedies.

The inconsistent pattern of underestimating serious cases while overreacting to minor ones led researchers to describe the chatbot’s decision-making as “paradoxical.”

Inconsistent Crisis Referrals

The study also evaluated how the chatbot handled mental health crises.

When users express suicidal intent, ChatGPT systems are programmed to refer them to 988, the U.S. Suicide and Crisis Lifeline. However, researchers found that ChatGPT Health sometimes referred users to crisis resources unnecessarily — and in other instances failed to provide the referral when appropriate.

An OpenAI spokesperson said the company welcomes independent research but argued the study does not reflect how ChatGPT Health is designed to be used. The system encourages follow-up questions and interactive dialogue, rather than one-time responses to isolated medical prompts.

OpenAI also emphasized that ChatGPT Health is not intended to diagnose or treat medical conditions and is currently available to a limited group of users while further safety improvements are underway.

Advertisement. Scroll to continue reading.

Growing Reliance on AI for Health Questions

The findings come as AI tools become increasingly integrated into healthcare conversations. OpenAI reports that tens of millions of people worldwide use ChatGPT for health-related inquiries, with a significant share of questions submitted outside traditional clinic hours or from locations far from medical facilities.

Experts say accessibility may explain why patients are turning to AI.

“You can ask unlimited follow-up questions and upload documents,” Ramaswamy said. “People want not just answers, but a kind of digital medical partner.”

Still, clinicians caution that AI systems are not substitutes for professional medical judgment.

Dr. John Mafi, a primary care physician at UCLA Health who was not involved in the study, said tools that influence medical decisions should undergo rigorous controlled trials before widespread deployment.

“There’s a major difference between passing a medical exam and practicing medicine safely,” he said.

Researchers also warn that AI systems can reflect biases in user input or training data. Large language models may unintentionally reinforce misconceptions by agreeing with flawed assumptions presented by users.

Not a Replacement for Doctors

While some experts believe AI chatbots can assist with general health education or administrative guidance, they stress that patients should not rely on them during emergencies.

The study’s authors recommend that AI health tools be used alongside — not instead of — licensed medical professionals. As AI adoption accelerates, they say collaboration between healthcare providers and technology developers will be essential to ensure patient safety.

For now, researchers advise caution: when symptoms suggest a serious condition, human medical evaluation remains the safest course of action.

In this article:AI emergency misdiagnosis, AI medical triage, ChatGPT Health study, medical AI safety, Nature Medicine research, OpenAI healthcare chatbot

Click to comment

Underestimating Critical Cases

Over-Triage of Minor Conditions

Inconsistent Crisis Referrals

Growing Reliance on AI for Health Questions

Not a Replacement for Doctors

Leave a Reply Cancel reply

Leave a Reply

Legal Affairs

James Comey Indicted Over “8647” Seashell Instagram Post Alleged to Threaten Donald Trump

Art & Culture

Robot dogs with Musk and Zuckerberg heads roam around Berlin museum in Beeple’s new exhibit

Healthcare

Hospital CEOs defend charging patients more at facilities

U.S. News

Cop haunted for decades by Roxanne Sharp’s brutal killing didn’t live to see suspects arrested

Energy & Economy

United Arab Emirates quits OPEC as Iran war raises Gulf tensions

Asia Politics

Federal Judge Rejects DOJ Lawsuit Seeking Arizona Voter Data in Major Legal Setback for Nationwide Election Records Push

Asia Politics

King Charles reaffirms ‘special’ relationship with U.S. amid tensions over Iran war

Disasters

Camp Mystic director offers tearful apology to flood victims’ families at legislative hearing

Asia Politics

King Charles reaffirms ‘special’ relationship with U.S. amid tensions over Iran war

Health Policy

Judge OKs OxyContin maker Purdue Pharma’s criminal sentence, a last step before it dissolves

U.S. News

U.S. soldier accused of betting on Maduro operation pleads not guilty

Military Technology

Ukraine says it shot down 33,000 Russian drones in March, a monthly record

Crime & Justice

NYPD investigation opened over ‘disturbing’ video of officers’ use of force during arrest

Business

China’s Economy Grows 5% in Q1 Despite Global War Pressures, Beating Expectations

Asia Politics

Federal Judge Rejects DOJ Lawsuit Seeking Arizona Voter Data in Major Legal Setback for Nationwide Election Records Push

U.S. News

Cop haunted for decades by Roxanne Sharp’s brutal killing didn’t live to see suspects arrested

Healthcare

Hospital CEOs defend charging patients more at facilities

Art & Culture

Robot dogs with Musk and Zuckerberg heads roam around Berlin museum in Beeple’s new exhibit

Energy & Economy

United Arab Emirates quits OPEC as Iran war raises Gulf tensions

Legal Affairs

James Comey Indicted Over “8647” Seashell Instagram Post Alleged to Threaten Donald Trump

2022 Midterm Elections

For Democratic veterans making midterm pitches, the Iran war is personal: From the Politics Desk

Disasters

Camp Mystic director offers tearful apology to flood victims’ families at legislative hearing

Asia Politics

King Charles reaffirms ‘special’ relationship with U.S. amid tensions over Iran war

U.S. News

U.S. soldier accused of betting on Maduro operation pleads not guilty

You May Also Like

Leave a Reply
Cancel reply