OpenAI has unveiled a new evaluation showing that its AI models can already perform many real-world work tasks—often at or near expert level. The findings aim to demonstrate how economically viable today’s AI is across multiple industries.
Others are reading now
OpenAI has unveiled a new evaluation showing that its AI models can already perform many real-world work tasks—often at or near expert level. The findings aim to demonstrate how economically viable today’s AI is across multiple industries.
OpenAI Launches ‘GDPval’

The new test, called GDPval, evaluates AI performance across 44 real-world occupations. It’s designed to ground AI progress in measurable, economically valuable outcomes.
Real-World Tasks, Not Just Benchmarks

Unlike theoretical exams, GDPval puts AI to work on real deliverables—like creating sales brochures, analyzing financial data, or reviewing medical images.
Broad Range of Jobs Tested

The 44 roles span industries including healthcare, law, software, customer service, real estate, and finance—areas where AI could soon play a major role.
AI Versus Industry Experts

According to OpenAI, the best frontier models are already nearing the output quality of professionals in their respective fields, at least on well-defined tasks.
What the Models Did

Also read
Sample tasks included assessing skin lesions (nursing), drafting financial memos, designing marketing materials, and identifying regulatory issues in contracts.
Anthropic Comes Out on Top

Anthropic’s Claude Opus 4.1 outperformed all models, including OpenAI’s own GPT-5, when graded by industry experts on 220 real-world assignments.
GPT-5-High Shows Muscle

A boosted version of GPT-5—called GPT-5-high—was rated equal to or better than human experts in over 40% of tasks. GPT-4o trailed far behind at just 13.7%.
Carefully Worded Messaging

Despite big performance claims, OpenAI is cautious in its language—saying its models “support” human work, not replace it outright.
Hallucinations Still a Problem

AI hallucinations—incorrect or made-up content—remain a core weakness, often requiring time-consuming human review to catch errors.
Real Work Isn’t Just Prompts

Also read
OpenAI admits most jobs aren’t a series of static tasks. Real-world work often demands nuance, judgment, and adaptability that AI still struggles to replicate.
Future of Work, Redefined

GDPval offers a glimpse into how AI might reshape work—automating repetitive tasks while leaving room for humans to handle the rest. The transformation is already underway.
This article is made and published by Asger Risom, which may have used AI in the preparation