AI Agents, Mathematics, and Making Sense of Chaos
From Artificiality This Week * Our Gathering: Our Artificiality Summit 2025 will be held on October 23-25 in Bend, Oregon. The
This week, in part 3, we look at what we've learned this year since the discovery in 2021 of "grokking" where generalization happens abruptly and long after fitting the training data.
You can read part 1 of the series here, and part 2 here.
In 2021, researchers training tiny models made a surprising discovery. A set of models suddenly flipped from memorizing their training data to correctly generalizing on unseen inputs after being trained for a much longer time. Since then, this phenomenon—called “grokking”—has been investigated further and reproduced in many contexts, at larger scale.
Generalization is a three stage process. Initially, models memorize data. They then form intricate internal circuits for problem solving. Finally, they refine these solutions. In a “clean up” phase, they shed redundant data dependencies.
Though appearing sudden in performance metrics, this process is gradual and nuanced under the surface. Train versus test metrics, which track the learning over time, show a linear progression. The sudden shift is evidence of the complex, layered nature of AI learning, where transformative moments are built upon a foundation of gradual, consistent learning.
What’s going on? Initially, neural networks focus on memorizing the training data. As the training progresses and the dataset grows, the complexity of memorization scales accordingly. However, the complexity involved in generalization—the network's ability to apply learned knowledge to new, unseen data—remains constant regardless of the dataset's size. At a certain point in the learning process, there's an inevitable crossover where the network shifts its focus from memorization to generalization, and this shift occurs quite suddenly, marking a pivotal moment in the network's training.
This abrupt shift is part of what's known as a phase transition. A phase transition in neural networks is a rapid development of a specific capability during a brief period of training. Rather than a gradual improvement, the model experiences a sudden leap from poor performance to proficiency in a specific task. This is thought to be due to the challenges in reaching a solution that generalizes well.
The current thinking is that as the network memorizes, it becomes increasingly complex, and its bias towards a more simple solution intervenes, preventing further memorization. Think of this bias as the Occam’s Razor for AI. This resolves the competing tension between continuing to memorize versus choosing the simpler solution. Memorization and generalization compete and, at some point in the training process, the model switches abruptly to generalization because it’s a simpler way to handle complexity.
This transition from rote learning to sophisticated problem solving is akin to how we learn. Memorization lays the foundation for learning general patterns and remains fundamental to human learning and creativity.
Without generalization, an intelligence can only respond to current inputs in a manner consistent with what worked in the past. If the future is the same as the past, that’s fine. But we are far more interested in situations where the future is different, where we need to discover some kind of underlying past pattern that shares something in common with an emergent future.
How we define intelligence is being rewritten. What does it mean to be intelligent? It is the ability to generalize that gives intelligence meaning, to learn anew and break out of past patterns, that makes human intelligence so multifaceted, useful, and endlessly useful. And when we can see this happen at the level of math inside an AI: spooky.
I’m utterly fascinated by how learning happens—whether we’re talking about biological or artificial intelligence. Returning to Sutton, who made a second point in his Bitter Lesson: the actual contents of minds are tremendously, irredeemably complex. He advocated then that we should stop trying to find simple ways to think about the contents of minds.
It might be fair to say that 2023 was the year when this view has been borne out empirically. The more we understand how these models learn, the more they appear to approach spookily intelligent learning.
I hope you've enjoyed this short series. If 2023 was the year when mechanistic interpretability moved from nascent to niche as some people claim, then 2024 may see it become solely an engineering problem. At the very least it will lead us into new territories of understanding what it means to be intelligent.
The Artificiality Weekend Briefing: About AI, Not Written by AI