Sep 13

OpenAI’s Strawberry: Teaching AI to Pause and Think

OpenAI has launched its latest model named “o1,” cheekily referred to by those in the know as "strawberry." Behind the somewhat playful name lies a potentially transformative leap in how these models approach reasoning, problem-solving, and—most intriguingly—how they "think." For an AI world that’s seen everything from basic chatbots to models capable of solving university-level calculus, o1 represents a distinct shift in philosophy.

Where previous models like GPT-4o and its predecessors were powerful and often impressive, they operated largely by pattern recognition and brute force—leveraging massive amounts of data to predict the next word or solve a problem. O1, however, introduces something altogether more nuanced. By design, it aims to engage in something more akin to human thought: deliberation. In layman's terms, o1 doesn’t just produce an answer; it reflects, plans, and even evaluates its options before responding, a marked departure from the purely reactive behavior of earlier AI models.

This isn’t just a theoretical upgrade. Examples provided by Noam Brown, a researcher at OpenAI, illustrate how o1 can handle challenges that have long been the Achilles' heel of language models: puzzles requiring foresight and abstract reasoning. In one instance, where models are tasked with navigating a game of tic-tac-toe, a game that seems trivial but is fiendishly tricky for most AI models, o1 performs far better than any previous attempts. It’s not perfect, Brown admits, but it’s able to scale and avoid blunders that earlier systems would routinely fall into. Similarly, the model tackles a block-stacking problem—another popular test in AI research—where it had to rearrange blocks without moving a critical piece, outthinking older models that would fail the task.

Yet, what really makes o1 stand apart from its predecessors, and from competitors, isn’t just these incremental improvements in performance. It’s how the model is trained. Brown explains that o1 relies on a technique called "reinforcement learning" (RL) to simulate a chain of thought. RL is nothing new in AI, but the way o1 employs it is innovative: it simulates private thinking time. Instead of spitting out responses immediately, it "thinks" privately for a few seconds (or longer), and this thinking time correlates with more accurate results in complex tasks. This shift represents a fundamental rethink in how AI might one day match or surpass human reasoning. The AI is no longer a glorified calculator; it’s becoming a deliberator.

This evolution towards thinking models has wide-ranging implications. It shifts the conversation from sheer computational power to the quality of decision-making. In AI research, there’s a growing consensus that it’s not about brute-forcing larger models anymore but about making them more thoughtful and reflective. OpenAI’s bet is that o1 and its successors could become more than just text generators or problem-solvers—they could become planners, researchers, perhaps even creators of entirely new fields of inquiry.

However, with greater power comes greater complexity. One of the most intriguing revelations from the o1 release is the decision to hide the model's chain of thought (CoT). Unlike earlier models where the process behind an answer was transparent, the o1 team has deliberately chosen to obscure this process from users. Brown mentioned this choice stemmed from a desire to avoid any manipulation or “muzzling” of the AI’s creative processes. But this also raises questions about the ethics of such opacity. As AI models grow more capable, will the public trust an algorithm whose reasoning is concealed from view? What if these hidden thoughts lead to unintended or controversial conclusions?

Benchmarking o1 reveals another important insight. While it doesn't outperform GPT-4o in all domains, particularly in more straightforward tasks where speed is paramount, o1 excels in more cognitively demanding challenges. This, it seems, is the tradeoff: you get a slower response, but in return, you get deeper reasoning. The implications extend far beyond parlor tricks like tic-tac-toe. Brown asks whether we might trust o1 to tackle real-world problems—like diagnosing rare diseases, developing new cancer treatments, or even solving longstanding scientific mysteries like the Riemann Hypothesis. If AI can pause to "think," it opens up the possibility for AI to contribute meaningfully to these problems.

Another element worth noting is how o1's development fits into a broader trend within AI research. OpenAI isn’t the only one working on these reflective models. The Gemini project from Google also focuses on reasoning and planning, though its emphasis differs in critical ways. The race to create AIs capable of "thinking" mirrors, in many respects, humanity’s own evolution of intelligence: not just about storing knowledge, but applying it meaningfully and creatively.

What’s perhaps most exciting—and unnerving—is the potential scalability of o1's approach. Brown hints that because this "thinking" model is no longer bottlenecked by pretraining (the process by which AIs are initially trained on vast datasets), OpenAI can now scale up inference computations, meaning the models could soon operate on unprecedented levels of complexity. They could think longer and harder than any human can, which, if you think about it, is both thrilling and terrifying.

In many ways, the release of o1 suggests we're entering a new phase in AI development—one that moves beyond the "predictive" nature of past models. It points to an AI future that isn’t just faster or more capable of handling large data sets but is actively deliberating, questioning, and perhaps even solving the problems that have confounded humanity for centuries. However, with this shift comes a raft of ethical and philosophical challenges. What does it mean for a machine to think? Should we trust something we cannot fully understand or see the thought process of? And crucially, how do we integrate these super-intelligent decision-makers into a world that is still navigating what AI means for society?

The age of AI "thinking" has begun, and while strawberry might seem like a sweet name for something so powerful, its impact might be anything but gentle.

OpenAI’s Strawberry: Teaching AI to Pause and Think

Recent Posts

Commentaires