[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help] 

Status: Not Logged In; Sign In

ICE agents pull screaming illegal immigrant influencer from car after resisting arrest

Aaron Lewis on Being Blacklisted & Why Record Labels Promote Terrible Music

Connecticut Democratic Party Holds Presser To Cry About Libs of TikTok

Trump wants concealed carry in DC.

Chinese 108m Steel Bridge Collapses in 3s, 16 Workers Fall 130m into Yellow River

COVID-19 mRNA-Induced TURBO CANCERS.

Think Tank Urges Dems To Drop These 45 Terms That Turn Off Normies

Man attempts to carjack a New Yorker

Test post re: IRS

How Managers Are Using AI To Hire And Fire People

Israel's Biggest US Donor Now Owns CBS

14 Million Illegals Entered US in 2023: The Cost to Our Nation

American Taxpayers to Cover $3.5 Billion Pentagon Bill for U.S. Munitions Used Defending Israel

The Great Jonny Quest Documentary

This story About IRS Abuse Did Not Post

CDC Data Exposes Surge in Deaths Among Children of Covid-Vaxxed Mothers

This Interview in Munich in 1992 with Gudrun Himmler. (Heinrich Himmler's daughter)

25 STRANGE Wild West Home Features You’ll Never See Again

Zionists DEMAND Megyn Kelly's Head!

Cash Jordan: Migrant Mob THREATENS Judge... ICE 'Instantly Deports' Courthouse of Illegals

Barricades placed outside Federal Building in Downtown L.A.

Hulk Hogan bombshell as cops investigate claim catastrophic medical error led to his death

Everything That's Wrong With The Leftist Media In One (Now Deleted) Post...

FBI Raids Warmonger John BoltonÂ’s Home and Office

BREAKING: John Bolton's home raided by federal agents

CDC Adviser Says Vote On RSV Antibody Was Based On Distorted Data

Dick Thinking for Dummies

Only 17% Of 25-34-Year-Old Americans Have Attained The 5 Major Milestones Of Adulthood

'WTF are you guys doing?' DOJ exposes 'black and white evidence' that Biden admin knew autopenned pardons were legally flawed

Cash Jordan: 270,000 Illegals ‘Forcibly Returned’ To Mexico… as Los Angeles COLLAPSES


Science/Tech
See other Science/Tech Articles

Title: Visualizing AI vs. Human Performance In Technical Tasks
Source: [None]
URL Source: https://www.zerohedge.com/technolog ... an-performance-technical-tasks
Published: Apr 29, 2025
Author: Tyler Durden
Post Date: 2025-04-29 07:05:05 by Horse
Keywords: None
Views: 37

The gap between human and machine reasoning is narrowing...and fast.

Over the past year, AI systems have continued to see rapid advancements, surpassing human performance in technical tasks where they previously fell short, such as advanced math and visual reasoning.

This graphic, via Visual Capitalist's Kayla Zhu, visualizes AI systems’ performance relative to human baselines for eight AI benchmarks measuring tasks including:

Image classification

Visual reasoning

Medium-level reading comprehension

English language understanding

Multitask language understanding

Competition-level mathematics

PhD-level science questions

Multimodal understanding and reasoning

This visualization is part of Visual Capitalist’s AI Week, sponsored by Terzo. Data comes from the Stanford University 2025 AI Index Report.

An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks.

AI Models Are Surpassing Humans in Technical Tasks Below, we show how AI models have performed relative to the human baseline in various technical tasks in recent years.

Year Perfomance relative to the human baseline (100%) Task

2023 47.78% PhD-level science questions

2023 93.67% Competition-level mathematics

2023 96.21% Multitask language understanding

2023 71.91% Multimodal understanding and reasoning

2024 108.00% PhD-level science questions

2024 108.78% Competition-level mathematics

2024 102.78% Multitask language understanding

2024 94.67% Multimodal understanding and reasoning

2024 101.78% English language understanding

From ChatGPT to Gemini, many of the world’s leading AI models are surpassing the human baseline in a range of technical tasks.

The only task where AI systems still haven’t caught up to humans is multimodal understanding and reasoning, which involves processing and reasoning across multiple formats and disciplines, such as images, charts, and diagrams.

However, the gap is closing quickly.

In 2024, OpenAI’s o1 model scored 78.2% on MMMU, a benchmark that evaluates models on multi-discipline tasks demanding college-level subject knowledge.

This was just 4.4 percentage points below the human benchmark of 82.6%. The o1 model also has one of the lowest hallucination rates out of all AI models.

This was major jump from the end of 2023, where Google Gemini scored just 59.4%, highlighting the rapid improvement of AI performance in these technical tasks.

To dive into all the AI Week content, visit our AI content hub, brought to you by Terzo.

To learn more about the global AI industry, check out this graphic that visualizes which countries are winning the AI patent race.


Poster Comment:

X AI is the newest entry into AI. It is less biased politically and is superior to all others except Chat GPT which is much older. X AI is getting better daily will soon be the best. (1 image)

Post Comment   Private Reply   Ignore Thread  



[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help]