Skip to main contentSkip to main content
Apertia.ai
Artificial Intelligence in IQ Tests: Who Would Pass Mensa and What Does It Mean?
Umělá inteligenceApril 8, 2025|4 min

Artificial Intelligence in IQ Tests: Who Would Pass Mensa and What Does It Mean?

In recent years, artificial intelligence has expanded the debate about how much language models are capable of truly "thinking" and where advanced word pattern prediction ends...

T
Tým Apertia
Apertia.ai
Share:

In recent years, artificial intelligence has expanded the debate about how much language models are capable of truly "thinking" and where advanced word pattern prediction ends.

While AI evaluation focuses on performance in translation, text comprehension, or code generation, a new testing direction focuses on traditional analytical and cognitive abilities of models - that is, what we commonly refer to as intelligence.

The TrackingAI.org project brings an entirely new perspective to this debate. They decided to test language models on tasks commonly used in human IQ tests, such as Raven's Progressive Matrices or Mensa Norway.

Testing AI Models Using IQ Tests

IQ tests - built on the ability to recognize patterns, reason deductively, and understand structure - have so far been the domain of human intelligence. However, as the results from the TrackingAI platform show, it is possible to apply these tests to artificial intelligence as well, with interesting implications.

TrackingAI uses two main types of testing:

  1. Offline IQ tests - tasks created independently, without appearing in model training data.

  2. Standardized Mensa Norway test, commonly used for evaluating human IQ.

Results show that some models (e.g., Gemini 2.5 Pro, Claude 3, or GPT-4.5) achieve results above the IQ 110 threshold, which in human terms would correspond to above-average intelligence.

In contrast, others, including earlier versions of Llama or some visual models, score in the 60-80 point range, which is below average.

Want a Custom AI Solution?

We help companies automate processes with AI. Contact us to find out how we can help you.

  • Response within 24 hours
  • No-obligation consultation
  • Solutions tailored to your business
More contacts

Which AI Models Achieve the Highest IQ?

The table below summarizes the current results of selected tested models in April 2025. The average score is calculated as the arithmetic mean of results from both test sets:

Model Offline test Mensa Norway Average IQ
Gemini 128 116 122
OpenAI o1 Pro 120 110 115
Claude 120 107 113.5
OpenAI o3 mini 119 106 112.5
GPT-4.5 Preview 106 101 103.5
Llama 4 Maverick 106 97 101.5

Interestingly, models with high performance in language comprehension are not always the best in logical testing. Multimodal architectures that combine text and visual inputs do not yet achieve stable performance across tasks.

https://apertia.ai/manus-ai-inovace-v-oblasti-automatizace/
Ready to start?

Interested in this article?

Let's explore together how AI can transform your business.

Contact us