Umělá inteligenceApril 8, 2025|4 min

Artificial Intelligence in IQ Tests: Who Would Pass Mensa and What Does It Mean?

In recent years, artificial intelligence has expanded the debate about how much language models are capable of truly "thinking" and where advanced word pattern prediction ends...

Tým Apertia

Apertia.ai

In recent years, artificial intelligence has expanded the debate about how much language models are capable of truly "thinking" and where advanced word pattern prediction ends.

While AI evaluation focuses on performance in translation, text comprehension, or code generation, a new testing direction focuses on traditional analytical and cognitive abilities of models - that is, what we commonly refer to as intelligence.

The TrackingAI.org project brings an entirely new perspective to this debate. They decided to test language models on tasks commonly used in human IQ tests, such as Raven's Progressive Matrices or Mensa Norway.

Testing AI Models Using IQ Tests

IQ tests - built on the ability to recognize patterns, reason deductively, and understand structure - have so far been the domain of human intelligence. However, as the results from the TrackingAI platform show, it is possible to apply these tests to artificial intelligence as well, with interesting implications.

TrackingAI uses two main types of testing:

Offline IQ tests - tasks created independently, without appearing in model training data.
Standardized Mensa Norway test, commonly used for evaluating human IQ.

Results show that some models (e.g., Gemini 2.5 Pro, Claude 3, or GPT-4.5) achieve results above the IQ 110 threshold, which in human terms would correspond to above-average intelligence.

In contrast, others, including earlier versions of Llama or some visual models, score in the 60-80 point range, which is below average.

Want a Custom AI Solution?

We help companies automate processes with AI. Contact us to find out how we can help you.

Response within 24 hours
No-obligation consultation
Solutions tailored to your business

Which AI Models Achieve the Highest IQ?

The table below summarizes the current results of selected tested models in April 2025. The average score is calculated as the arithmetic mean of results from both test sets:

Model	Offline test	Mensa Norway	Average IQ
Gemini	128	116	122
OpenAI o1 Pro	120	110	115
Claude	120	107	113.5
OpenAI o3 mini	119	106	112.5
GPT-4.5 Preview	106	101	103.5
Llama 4 Maverick	106	97	101.5

Interestingly, models with high performance in language comprehension are not always the best in logical testing. Multimodal architectures that combine text and visual inputs do not yet achieve stable performance across tasks.

https://apertia.ai/manus-ai-inovace-v-oblasti-automatizace/