4um: Visualizing AI vs. Human Performance In Technical Tasks

[Home] [Headlines] [Latest Articles] [Latest Comments] [Post] [Sign-in] [Mail] [Setup] [Help]

Status: Not Logged In; Sign In

ICE agents pull screaming illegal immigrant influencer from car after resisting arrest
Aaron Lewis on Being Blacklisted & Why Record Labels Promote Terrible Music
Connecticut Democratic Party Holds Presser To Cry About Libs of TikTok
Trump wants concealed carry in DC.
Chinese 108m Steel Bridge Collapses in 3s, 16 Workers Fall 130m into Yellow River
COVID-19 mRNA-Induced TURBO CANCERS.
Think Tank Urges Dems To Drop These 45 Terms That Turn Off Normies
Man attempts to carjack a New Yorker
Test post re: IRS
How Managers Are Using AI To Hire And Fire People
Israel's Biggest US Donor Now Owns CBS
14 Million Illegals Entered US in 2023: The Cost to Our Nation
American Taxpayers to Cover $3.5 Billion Pentagon Bill for U.S. Munitions Used Defending Israel
The Great Jonny Quest Documentary
This story About IRS Abuse Did Not Post
CDC Data Exposes Surge in Deaths Among Children of Covid-Vaxxed Mothers
This Interview in Munich in 1992 with Gudrun Himmler. (Heinrich Himmler's daughter)
25 STRANGE Wild West Home Features You’ll Never See Again
Zionists DEMAND Megyn Kelly's Head!
Cash Jordan: Migrant Mob THREATENS Judge... ICE 'Instantly Deports' Courthouse of Illegals
Barricades placed outside Federal Building in Downtown L.A.
Hulk Hogan bombshell as cops investigate claim catastrophic medical error led to his death
Everything That's Wrong With The Leftist Media In One (Now Deleted) Post...
FBI Raids Warmonger John BoltonÂ’s Home and Office
BREAKING: John Bolton's home raided by federal agents
CDC Adviser Says Vote On RSV Antibody Was Based On Distorted Data
Dick Thinking for Dummies
Only 17% Of 25-34-Year-Old Americans Have Attained The 5 Major Milestones Of Adulthood
'WTF are you guys doing?' DOJ exposes 'black and white evidence' that Biden admin knew autopenned pardons were legally flawed
Cash Jordan: 270,000 Illegals ‘Forcibly Returned’ To Mexico… as Los Angeles COLLAPSES

Science/Tech
See other Science/Tech Articles

Title: Visualizing AI vs. Human Performance In Technical Tasks
Source: [None]
URL Source: https://www.zerohedge.com/technolog ... an-performance-technical-tasks
Published: Apr 29, 2025
Author: Tyler Durden
Post Date: 2025-04-29 07:05:05 by Horse
Keywords: None
Views: 37

The gap between human and machine reasoning is narrowing...and fast.

Over the past year, AI systems have continued to see rapid advancements, surpassing human performance in technical tasks where they previously fell short, such as advanced math and visual reasoning.

This graphic, via Visual Capitalist's Kayla Zhu, visualizes AI systems’ performance relative to human baselines for eight AI benchmarks measuring tasks including:

Image classification

Visual reasoning

Medium-level reading comprehension

English language understanding

Multitask language understanding

Competition-level mathematics

PhD-level science questions

Multimodal understanding and reasoning

This visualization is part of Visual Capitalist’s AI Week, sponsored by Terzo. Data comes from the Stanford University 2025 AI Index Report.

An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks.

AI Models Are Surpassing Humans in Technical Tasks Below, we show how AI models have performed relative to the human baseline in various technical tasks in recent years.

Year Perfomance relative to the human baseline (100%) Task

2023 47.78% PhD-level science questions

2023 93.67% Competition-level mathematics

2023 96.21% Multitask language understanding

2023 71.91% Multimodal understanding and reasoning

2024 108.00% PhD-level science questions

2024 108.78% Competition-level mathematics

2024 102.78% Multitask language understanding

2024 94.67% Multimodal understanding and reasoning

2024 101.78% English language understanding

From ChatGPT to Gemini, many of the world’s leading AI models are surpassing the human baseline in a range of technical tasks.

The only task where AI systems still haven’t caught up to humans is multimodal understanding and reasoning, which involves processing and reasoning across multiple formats and disciplines, such as images, charts, and diagrams.

However, the gap is closing quickly.

In 2024, OpenAI’s o1 model scored 78.2% on MMMU, a benchmark that evaluates models on multi-discipline tasks demanding college-level subject knowledge.

This was just 4.4 percentage points below the human benchmark of 82.6%. The o1 model also has one of the lowest hallucination rates out of all AI models.

This was major jump from the end of 2023, where Google Gemini scored just 59.4%, highlighting the rapid improvement of AI performance in these technical tasks.

To dive into all the AI Week content, visit our AI content hub, brought to you by Terzo.

To learn more about the global AI industry, check out this graphic that visualizes which countries are winning the AI patent race.

Poster Comment:

X AI is the newest entry into AI. It is less biased politically and is superior to all others except Chat GPT which is much older. X AI is getting better daily will soon be the best. (1 image)

Post Comment Private Reply Ignore Thread

[Home] [Headlines] [Latest Articles] [Latest Comments] [Post] [Sign-in] [Mail] [Setup] [Help]