Homepage News Google AI Overviews still produce millions of incorrect answers every...

Google AI Overviews still produce millions of incorrect answers every hour

Google, AI mode, Search Engine, Browser
Tada Images / Shutterstock.com

Google’s AI Overviews are accurate most of the time, but at search-engine scale, their error rate still turns into millions of incorrect answers every hour.

Google’s AI-generated search summaries have improved, but at the scale of Google Search, even a relatively small error rate becomes a major information problem.

According to a New York Times experiment conducted with AI startup Oumi, Google AI Overviews answered correctly around 90% of the time. That also means roughly 1 in 10 answers was wrong.

Applied to the volume of searches Google handles, that error rate translates into millions of incorrect AI-generated answers every hour.

Accuracy has improved, but scale changes the stakes

AI Overviews use Google’s Gemini models to generate short answers directly inside search results.

The system has reportedly improved since earlier testing, rising from around 85% accuracy with Gemini 2.5 to about 91% after the move to Gemini 3.

But the problem is not only whether the system is improving. It is that even a 90% accuracy rate still leaves a large number of false answers when deployed across one of the world’s most-used information platforms.

Google disputes the methodology

Google has pushed back on the findings, arguing that the test does not reflect how people actually use Search.

The company also criticized the SimpleQA benchmark used in the experiment, saying it may contain inaccuracies. Google says it uses its own more tightly verified version of the test when evaluating performance.

The company’s position is that the study overstates the real-world problem.

Speed, cost and accuracy remain in tension

AI Overviews do not rely on one single model for every answer.

Google has said the system selects what it considers the most relevant model for each query. More powerful models may produce better results, but they are also slower and more expensive to run at search-engine scale.

That leaves Google balancing accuracy against speed, cost and user experience.

The trust problem is bigger than the error rate

A 90% success rate may sound strong by AI industry standards, but Search is different from a chatbot demo or internal benchmark.

When Google places an AI answer at the top of a results page, many users may treat it as authoritative and never click through to the original sources.

That makes each incorrect answer more consequential. Google acknowledges the risk with its own warning that AI can be wrong and users should check information again.

The issue is not that AI Overviews always fail. It is that they are trusted at a scale where even occasional failure becomes massive.

Sources: New York Times experiment with Oumi, Google statements, Ars Technica

Ads by MGDK