ARC-AGI-3 dropped the same week Jensen Huang declared AGI achieved. Gemini scored 0.37%. GPT-5.4 got 0.26%. Humans hit 100%.
ARC-AGI-3 tests whether models can reason through novel problems, not just recall patterns, a task even top systems still ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results