AI tools are now part of everyday life. People use them to search for health advice, understand symptoms, or learn about treatments.
It feels quick and simple. But new research shows that medical answers from chatbots are not always reliable.
Problematic answers
A group of researchers from the US, UK, and Canada tested five major systems: ChatGPT, Gemini, Grok, Meta AI, and DeepSeek, reports WP. The study was published in BMJ Open. Each chatbot answered 50 medical questions. The topics included cancer, vaccines, nutrition, stem cells, and sports performance.
Two medical experts checked every answer. The results raised concern. About 20 percent of responses were judged highly problematic. Half were considered problematic. Around 30 percent were partly problematic. Only a very small number were fully accurate.
None of the chatbots produced completely correct reference lists. Out of 250 answers, only two were fully accepted without issues. Grok had the highest rate of problematic answers at 58 percent. ChatGPT followed with 52 percent. Meta AI was close behind at 50 percent.
The quality of answers depended on the topic. Chatbots did better on vaccines and cancer. These areas have more structured research available online. Even there, about one in four answers still had issues.
Nutrition and sports
The situation was worse in nutrition and sports. These topics often have conflicting advice online. Scientific agreement is weaker. This led to more confusion in the responses.
Open-ended questions also caused problems. These are the types of questions people usually ask in real life. Around 32 percent of answers in this category were highly problematic. For simple closed questions, the rate dropped to 7 percent.
The study also found major issues with citations. When asked for scientific sources, the chatbots often produced incomplete or incorrect lists. Some references had wrong authors. Others were completely made up.
Experts say this happens because language models do not understand information like humans. They predict text based on patterns. They do not verify facts or judge evidence. Their training data includes scientific papers but also blogs, forums, and social media.
Researchers warn that chatbots can still be useful for general guidance or helping patients prepare questions for doctors. But they should not be used as a final medical authority. Health information should always be checked with reliable sources or professionals.