logo
welcome
Live Science

Live Science

Mathematicians have devised new problems to challenge the most advanced AI systems' reasoning capabilities — and they failed almost every test

Live Science
Summary
Nutrition label

78% Informative

The most advanced AI models on the market got correct answers on less than 2% of these problems.

The new set of benchmarks, called FrontierMath , aims for a higher level of reasoning.

The findings show that right now, AI models don't possess research-level math reasoning.

VR Score

90

Informative language

98

Neutral language

13

Article tone

formal

Language

English

Language complexity

58

Offensive language

not offensive

Hate speech

not hateful

Attention-grabbing headline

not detected

Known propaganda techniques

not detected

Time-value

long-living