The model that was the first to break the 15% barrier in the extremely challenging ARC AGI 2.0 test. A model that is supposed to be smarter than an entire generation of graduate students. We're talking about Grok 4, the latest artificial intelligence from xAI.
While many language models attract attention with their parameter volume and multimodality, Grok 4 takes a different path. It combines performance with an architecture designed for deep understanding, deductive reasoning, and the ability to assist developers in their daily work. In this article, we'll look at how Grok performs in practice - and most importantly, what it offers to programmers who are looking for more than just an automatic syntax generator.
Umělá inteligenceJuly 11, 2025|2 min
Development with Grok 4 Is Faster, Smarter, and More Dynamic
The model that was the first to break the 15% barrier in the extremely challenging ARC AGI 2.0 test. A model that is supposed to be smarter than an entire generation of graduate students...
T
Tým Apertia
Apertia.ai
Share:
Want a Custom AI Solution?
We help companies automate processes with AI. Contact us to find out how we can help you.
- Response within 24 hours
- No-obligation consultation
- Solutions tailored to your business
Grok 4 achieved 15.3% on the ARC AGI 2.0 test - one of the most challenging tests of general intelligence, focused on the ability to solve logical, mathematical, and language problems similar to IQ tests. For comparison: GPT-4 scores around 13%, Gemini 2.5 Pro and Claude 3 Opus slightly below this threshold.
This result confirms a high level of reasoning ability - a key capability for development tasks requiring more than just syntax generation. ARC benchmark - Allen Institute for AI
These results show that Grok 4 is not just a PR product, but genuinely ranks among the best models on the market capable of solving tasks with high complexity and non-deterministic scenarios.
For developers, this means the model better understands intent, estimates logical connections, and is capable of proposing solutions in context, not just based on learned patterns.
| Model | ARC AGI 2.0 Score | HumanEval | Codeforces Rank |
|---|---|---|---|
| Grok 4 | 15.3% | 75-78% | Master (~2100) |
| GPT-4 (OpenAI) | 12-14% | 67-72% | Candidate Master |
| Claude 3 Opus | 10-11% | 70-75% | ~Expert |
| Gemini 2.5 Pro | 13% | 76-80% | Master |
Ready to start?
Interested in this article?
Let's explore together how AI can transform your business.
Contact us


