Health Advice From A.I. Chatbots Is Frequently Wrong. In part due to how users are asking their questions (Feb 2026, n=1,298) Reliability of LLMs as medical assistants for the general public: a randomized preregistered study Study 

Michael Harrop

Well-known member
Joined
Jul 6, 2023
Messages
1,483
Location
USA
https://www.nytimes.com/2026/02/09/well/chatgpt-health-advice.html
https://www.nature.com/articles/s41591-025-04074-y

The experiment found that the chatbots were no better than Google — already a flawed source of health information — at guiding users toward the correct diagnoses or helping them determine what they should do next. And the technology posed unique risks, sometimes presenting false information or dramatically changing its advice depending on slight changes in the wording of the questions.

Abstract​

Global healthcare providers are exploring the use of large language models (LLMs) to provide medical advice to the public. LLMs now achieve nearly perfect scores on medical licensing exams, but this does not necessarily translate to accurate performance in real-world settings.

We tested whether LLMs can assist members of the public in identifying underlying conditions and choosing a course of action (disposition) in ten medical scenarios in a controlled study with 1,298 participants. Participants were randomly assigned to receive assistance from an LLM (GPT-4o, Llama 3, Command R+) or a source of their choice (control).

Tested alone, LLMs complete the scenarios accurately, correctly identifying conditions in 94.9% of cases and disposition in 56.3% on average. However, participants using the same LLMs identified relevant conditions in fewer than 34.5% of cases and disposition in fewer than 44.2%, both no better than the control group. We identify user interactions as a challenge to the deployment of LLMs for medical advice. Standard benchmarks for medical knowledge and simulated patient interactions do not predict the failures we find with human participants.

Moving forward, we recommend systematic human user testing to evaluate interactive capabilities before public deployments in healthcare.
 
Format correct?
  1. Yes
I've been very happy with my coaching by chatGBT. I am pain free for the first time in years. And my labs support that what chatGBT diagnosed is a correct, and the changes it suggested were also correct. Beats being diagnosed with anxiety, which is what I got from actual doctors for years.
 
Back
Top Bottom