A study by AI platform ChatGept Health routinely ignores the need for urgent medical care and often fails to detect suicidal ideation, which experts worry could “potentially lead to unnecessary harm and death”.
OpenAI launched ChatGPT’s “Health” feature to a limited audience in January, which it promotes as a way to “securely connect medical records and wellness apps” to generate health advice and responses for users. more than this Reportedly 40 million people ask ChatGPT For everyday health advice.
The first independent safety assessment of ChatGPT Health, Published in the February edition of the journal Nature MedicineIt was found that more than half of the cases presented before it were heard short.
Dr. Ashwin Ramaswamy, lead author of the study, said, “We wanted to answer the most basic safety question; if someone was experiencing a true medical emergency and asked ChatGPT Health what to do, would it tell them to go to the emergency department?”
Ramaswamy and his colleagues created 60 realistic patient scenarios covering health conditions ranging from mild illnesses to emergencies. Three independent doctors reviewed each scenario and agreed on the level of care required based on clinical guidelines.
Sign up: AU Breaking News Email
The team then asked ChatGPIT Health for advice on each case in different circumstances, including changing the patient’s gender, adding test results, or adding comments from family members, generating nearly 1,000 responses.
They then compared the platform’s recommendations with doctors’ assessments.
Although it performed well in emergency situations such as stroke or severe allergic reactions, it struggled in other situations. In an asthma scenario, the platform advised waiting rather than seeking emergency treatment despite identifying early warning signs of respiratory failure.
In 51.6% of cases where someone needed to go to hospital immediately, the platform said to stay home or book a routine medical appointment, a result that Alex Ruani, a doctoral researcher in health misinformation mitigation at University College London, described as “incredibly dangerous”.
“If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance that this AI will tell you it’s no big deal,” she said. “What concerns me most is the false sense of security created by these systems. If someone is told to wait 48 hours during an asthma attack or diabetes crisis, that reassurance could cost them their life.”
In one simulation, eight out of 10 times (84%), the platform sent a suffocated woman to a future appointment she would not live to see, Ruani said. Meanwhile, 64.8% of fully protected individuals were told to seek immediate medical care, said Ruani, who was not involved in the study.
They were also about 12 times more likely to downplay symptoms on the platform than the “patient” told it to a “friend” in the scenario, which showed it was nothing serious.
“That’s why many of us studying these systems are urgently focusing on developing clear security standards and independent auditing mechanisms to reduce preventable harm,” Ruani said.
A spokesperson for OpenAI said that the company welcomed independent research evaluating AI systems in healthcare, but that the study did not reflect how people use ChatGPIT Health in real life. The model is constantly updated and refined, the spokesperson said.
Ruani said that even if simulations created by researchers were used, “a potential risk of harm is sufficient to justify stronger safeguards and independent oversight”.
Ramaswami, Urology Instructor The Icahn School of Medicine at Mount Sinai in the US said it was particularly concerned by the platform’s low response to suicidal ideation.
“We tested ChatGPS Health with a 27-year-old patient who said he was thinking about taking a lot of pills,” he said. When the patient described his symptoms in private, a crisis intervention banner associated with suicide support services appeared each time.
“Then we added the normal lab results,” Ramaswamy said. “Same patience, same words, same seriousness. The banner disappeared. Zero out of 16 attempts. A crisis guardrail that depends on you telling them your labs aren’t ready, and that’s certainly more dangerous than having no guardrail at all, because no one can predict when it will fail.”
Professor Paul Heineman, digital sociologist and policy expert at the University of Queensland, said; “This is a really important paper”.
“If ChatGPT Health is used by people at home, it could increase the number of unnecessary medical presentations for low-level conditions, and lead to a failure for people to get immediate medical care when they need it, potentially leading to unnecessary harm and death.”
This also increases the potential for legal liability, he said, with several legal cases already pending against tech companies in relation to suicides and self-harm after using AI chatbots.
“It is unclear what OpenAI wanted to achieve by creating this product, how it was trained, what guardrails it introduced, and what warnings it provided to users,” Heineman said.
“Because we don’t know how ChatGPT Health was trained and what context it was using, we don’t really know what’s underlying its model.”
