Study Shows AI Chatbots Still Fall Short on Reliable Health Advice for Humans

A new study reveals that, despite performing well on medical exams, AI chatbots do not provide more accurate health guidance than traditional research methods.
“AI is not ready to replace physicians”, said Rebecca Payne, co-author of the study from Oxford University. She warned that relying on chatbots for symptom checks can be risky, potentially leading to incorrect diagnoses or failing to identify situations requiring urgent medical attention.
The research team tested how effectively people could use AI chatbots to identify health issues and determine whether they needed medical care. Nearly 1,300 participants in the UK were presented with 10 realistic scenarios, ranging from headaches after drinking to postnatal exhaustion or symptoms of gallstones.
Participants were randomly assigned one of three AI chatbots – OpenAI’s GPT-4o, Meta’s Llama 3, or Command R+, while a control group used standard internet searches. Results showed that participants using chatbots correctly identified their health issues only about a third of the time, and correctly determined the appropriate next steps roughly 45 percent of the time, no better than the control group.
The researchers attributed the gap between AI exam performance and real-world results to human factors. Participants often failed to provide complete information, misinterpreted chatbot responses, or ignored recommendations altogether.
One in six U.S. adults reportedly consult AI chatbots for health information at least once a month, a number expected to grow as the technology becomes more widespread.




