[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help] 

Status: Not Logged In; Sign In

"Absurd Conspiracy": French Media Rushes To Quash Claims Macron, Merz & Starmer Caught Hiding Cocaine On Kiev-Bound Train

Mark Dice: Blsck Fatigue

How to Get Enough Polyphenols for Life Extension

Celgene charges $1,000 for cancer drug that costs 25 cents to make

DERELICTION OF DUTY: Chief Justice John Roberts Admits Its His Job to Rein in the Judicial Insurrectionand Hes Not Doing It

SHOCKING Share of Adults ADMIT Fentanyl Use!

All 6 US B-2 stealth bombers have departed Diego Garcia Airbase in the Indian Ocean and returned to the United States.

Trump official who is taking over DOGE from Elon Musk revealed... and Democrats will be furious

DNC official reveals how Democrats spent $2 billion trying to trick Americans

MSNBC Admits Their Rumor About Kash Patel and Nightclubs Is A Misstatement

I Had No Idea It Could Be So Expensive Not To Build Anything

Spike Proteins Are Being Found in Stroke Victims Brains Up to 17 Months After Being Vaxxed

FIVE childhood vaccines ALL test positive for glyphosate

America is Under Siege – 233 Federal Cases Against Trump – Larry Klayman

Must be 21+ to dine inside Franconia Road McDonald's |

U.S., China Reach Agreement To Lower Tariffs In 90-Day Cool-Off Period

African Woman Protects Herself from Muggers

Cafe Owner Kicked Israelis out. Then This happened

The True COST of ILLEGAL IMMIGRATION explained - Edward Dowd

People are just starting to understand the economic impact of illegal migration on an economy.

Freight Fraud, Cargo Theft, Deadly Collisions - Ghost Carriers Are Growing National Security Threat

Hamas To Release American-Israeli Hostage As Goodwill Gesture To Trump

Targeted by the mind control programs of the evil ones (Pedos)

Ex-CIA agent gives his take on some of America's biggest historical events...

Asheville N.C. hit again. May 9th 2025

"No One is Prepared for What’s Happening in EUROPE

"This loss is permanent"

Daniela Cambone: The Great Taking Author Interview

Polar ice rebounds confound alarmist predictions: New studies highlight climates unpredictable dance

NBC: The United States, Europe and Ukraine have made a list of 22 conditions for ending the conflict


Science/Tech
See other Science/Tech Articles

Title: Visualizing AI vs. Human Performance In Technical Tasks
Source: [None]
URL Source: https://www.zerohedge.com/technolog ... an-performance-technical-tasks
Published: Apr 29, 2025
Author: Tyler Durden
Post Date: 2025-04-29 07:05:05 by Horse
Keywords: None
Views: 16

The gap between human and machine reasoning is narrowing...and fast.

Over the past year, AI systems have continued to see rapid advancements, surpassing human performance in technical tasks where they previously fell short, such as advanced math and visual reasoning.

This graphic, via Visual Capitalist's Kayla Zhu, visualizes AI systems’ performance relative to human baselines for eight AI benchmarks measuring tasks including:

Image classification

Visual reasoning

Medium-level reading comprehension

English language understanding

Multitask language understanding

Competition-level mathematics

PhD-level science questions

Multimodal understanding and reasoning

This visualization is part of Visual Capitalist’s AI Week, sponsored by Terzo. Data comes from the Stanford University 2025 AI Index Report.

An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks.

AI Models Are Surpassing Humans in Technical Tasks Below, we show how AI models have performed relative to the human baseline in various technical tasks in recent years.

Year Perfomance relative to the human baseline (100%) Task

2023 47.78% PhD-level science questions

2023 93.67% Competition-level mathematics

2023 96.21% Multitask language understanding

2023 71.91% Multimodal understanding and reasoning

2024 108.00% PhD-level science questions

2024 108.78% Competition-level mathematics

2024 102.78% Multitask language understanding

2024 94.67% Multimodal understanding and reasoning

2024 101.78% English language understanding

From ChatGPT to Gemini, many of the world’s leading AI models are surpassing the human baseline in a range of technical tasks.

The only task where AI systems still haven’t caught up to humans is multimodal understanding and reasoning, which involves processing and reasoning across multiple formats and disciplines, such as images, charts, and diagrams.

However, the gap is closing quickly.

In 2024, OpenAI’s o1 model scored 78.2% on MMMU, a benchmark that evaluates models on multi-discipline tasks demanding college-level subject knowledge.

This was just 4.4 percentage points below the human benchmark of 82.6%. The o1 model also has one of the lowest hallucination rates out of all AI models.

This was major jump from the end of 2023, where Google Gemini scored just 59.4%, highlighting the rapid improvement of AI performance in these technical tasks.

To dive into all the AI Week content, visit our AI content hub, brought to you by Terzo.

To learn more about the global AI industry, check out this graphic that visualizes which countries are winning the AI patent race.


Poster Comment:

X AI is the newest entry into AI. It is less biased politically and is superior to all others except Chat GPT which is much older. X AI is getting better daily will soon be the best. (1 image)

Post Comment   Private Reply   Ignore Thread  



[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help]