A Historic Moment in AI: The Unveiling of OpenAI’s 03 Model
Today marks a very historic moment for the AI community. It’s likely to be regarded as the day when Artificial General Intelligence (AGI) became a reality. OpenAI today announced the release of their new 03 model, the second iteration of their 01 series—a model that is known for its extensive reasoning capability.
This development is considered potential AGI because the new system managed to surpass human performance in the ARC benchmark. The ARC benchmark is crucial as it is resistant to memorization, functioning as an IQ test for machine intelligence. It requires only core knowledge like elementary physics and counting, which even a young child possesses. ARC puzzles are novel, making them challenging even if one memorizes vast amounts of information.
The Challenge of ARC and AI’s Remarkable Achievement
ARC tests AI’s ability to learn new skills on the fly, rather than repeating what is memorized. AGI version one took five years to go from 0% to 5% with leading frontier models. Today, the 03 model has achieved a state-of-the-art score of 75.7 on ARC’s semi-private holdout set, making it the new top entry. This remarkable achievement happened within the compute requirements for their public leaderboard, marking it as groundbreaking progress.
Two versions were developed: low tuning and high tuning. The low-tuned model operates with minimal computational effort, suitable for simpler tasks. Conversely, the high-tuned model is optimized for complex tasks requiring deeper reasoning and multi-step problem-solving. By tuning the model to think longer, it surpassed human capabilities in several areas.
Beyond Benchmarks: Toward AGI
The creators of the ARC AGI benchmark have noted that while performance on ARC signifies a genuine breakthrough in novelty adaptation, AGI 03 still fails at some simple tasks, indicating differences from human intelligence. Although Francis Chollet, a significant figure in the AI community, believes this doesn’t yet represent AGI, it is a meaningful step toward it. He mentions that creating unsaturated benchmarks easy for humans but challenging for AI is feasible without specialist knowledge.
Previously, it was not expected that ARC AGI would be solved in under eight years. However, AI’s rapid progress is evident with scores surpassing 80%, as seen in today’s developments. The model’s limitations stem from compute costs, exemplified by models requiring thousands of dollars per task.
Innovation Across Software Engineering and Math
The 03 model performed exceptionally well on the SWE benchmark—a formidable software engineering test—revealing promising advancements for the field. It signifies a demand for engineers who understand the underlying code. The math benchmarks reinforce this narrative, with models achieving up to 96.7% in competition math, nearing benchmark saturation.
As benchmarks approach 95% or higher, further improvements become hard-won victories. Some questions in benchmarks are contaminated or unsolvable, limiting perfect scores. Despite these challenges, the 03 model’s achievements highlight its capacity for handling difficult research math problems, showcasing a 20-fold increase over current standards.
The Future Trajectory of AI
Noan Brown from OpenAI believes that model 03 will continue advancing at this pace. As AGI’s definition evolves, becoming less useful as a term, we anticipate systems capable of astonishing cognitive tasks by the end of 2025.
OpenAI is enthusiastic about the potential applications of these models once released, encouraging experts to test the 03 model rigorously. The journey toward AGI continues, and it will be fascinating to see how these innovations shape our world.
Source: https://www.youtube.com/watch?v=CRuhyF3oj0c
Comments