A new study finds that up to half of AI-generated medical answers are flawed, raising concerns about misleading information that appears credible but may lack accuracy or reliable sources.

Others are reading now

“Slopaganda”: how AI-generated content is reshaping the information war between the US and Iran

Russian NATO-neighbour is mobilizing its reservists

AI chatbots are increasingly being used for medical advice, offering fast, confident responses to complex health questions.

But new research suggests those answers may be far less reliable than they appear.

Flawed but fluent

A study published in BMJ Open found that many popular AI chatbots frequently produce misleading or inaccurate medical information. According to the research, nearly half of all responses were classified as problematic, even when they appeared polished and authoritative.

The study tested five widely used tools—ChatGPT, Gemini, Grok, Meta AI and DeepSeek—by asking 50 health-related questions covering topics such as cancer, vaccines, nutrition and athletic performance.

Accuracy varies widely

Performance differed depending on the subject matter.

Chatbots handled structured, well-researched areas like vaccines and cancer relatively better, though still produced incorrect or misleading answers roughly a quarter of the time.

They struggled most with topics like nutrition and fitness, where online information is often inconsistent or poorly evidenced.

Open-ended questions proved especially challenging. About 32% of those responses were rated highly problematic, compared to just 7% for more straightforward, closed questions.

The illusion of credibility

One of the study’s most concerning findings involved references.

When asked to provide scientific sources, chatbots frequently produced incomplete, incorrect or entirely fabricated citations. The median accuracy score for references was just 40%, and none of the systems consistently generated fully reliable lists.

This creates a false sense of authority, as neatly formatted citations can make inaccurate information appear trustworthy to users.

Why errors happen

Researchers say the issue lies in how these systems work.

Large language models do not verify facts. Instead, they generate responses based on patterns in their training data, which can include both reliable research and low-quality online content.

The study deliberately used “red teaming” techniques—questions designed to expose weaknesses—meaning the results reflect how chatbots perform under pressure, but also mirror real-world use where users often ask vague or leading questions.

A broader pattern

Other recent studies reinforce these concerns.

Research published in Nature Medicine found that while chatbots can sometimes arrive at correct answers, users often misinterpret them, resulting in accuracy rates below 35% in practice.

Another study in JAMA Network Open showed AI systems struggled to suggest correct diagnoses when given limited information, failing more than 80% of the time.

Meanwhile, findings in Nature Communications Medicine revealed that chatbots can repeat and expand on entirely fabricated medical concepts.

Use with caution

Despite these limitations, experts say AI tools can still be useful when used appropriately.

They can help summarise complex topics or assist users in preparing questions for healthcare professionals. However, the study warns against treating chatbots as standalone medical authorities.

Users are advised to verify claims, treat references with skepticism and be cautious of answers that sound confident but lack nuance or disclaimers.

Sources: BMJ Open, Nature Medicine, JAMA Network Open, Nature Communications Medicine

This article is made and published by Asger Risom, who may have used AI in the preparation

Half of AI health answers are wrong—even when they sound convincing, study finds

Others are reading now

“Slopaganda”: how AI-generated content is reshaping the information war between the US and Iran

Russian NATO-neighbour is mobilizing its reservists

Flawed but fluent

Also read

Trump presents himself as a peacemaker: Opens the door to a Russia visit

California farmers forced to destroy 420,000 peach trees after factory closures

Accuracy varies widely

The illusion of credibility

Why errors happen

A broader pattern

Use with caution

Also read

Conservatives turn on Trump over Iran and Israel

Trump plans talks with King Charles during Washington visit

New study challenges Ozempic “no effort” myth