AI Agents, Mathematics, and Making Sense of Chaos
From Artificiality This Week * Our Gathering: Our Artificiality Summit 2025 will be held on October 23-25 in Bend, Oregon. The
New research on developing AI that builds robust world models shows an AI's ability to seek out surprise, motivation, and novelty, enabling it to navigate and understand the complexities of the world through self-driven exploration rather than just following predetermined reward pathways.
Think of how a child explores a new space. Without any specific instructions or incentives, they touch, test, and learn, absorbing details about their surroundings. Capturing this essence of exploration and learning in artificial intelligence represents a significant challenge, one that we intuitively understand is necessary for agentic AI.
Marc Rigter, Minqi Jiang, and Ingmar Posner from University of Oxford and UCL recently published research on developing AI that builds robust world models. This approach could be vital because it focuses on cultivating an AI's ability to seek out surprise, motivation, and novelty, enabling it to navigate and understand the complexities of the world through self-driven exploration rather than just following predetermined reward pathways.
First, some background. If you aren't familiar with the idea of a "world model" the concept has been a topic of much debate and research in the AI community. A world model is essentially an internal representation that an AI agent builds of its environment, encompassing its understanding of the world's dynamics and the consequences of its actions. Some argue that a world model is crucial for an AI system to exhibit intelligent behavior, as it allows the agent to reason about the world, plan for the future, and make informed decisions. Others contend that intelligent behavior can emerge from model-free approaches, where the agent learns directly from its interactions with the environment without explicitly constructing a world model.
Many AI regimes use reinforcement learning (RL) and a foundational mechanism for an AI to understand its environment. It is the key method by which AI systems learn from their environment, refining their actions based on a system of rewards and penalties. RL hinges on being able to construct a reward function that signals to the AI what outcomes are favorable and should be sought after. While RL has propelled big advancements in AI's ability to engage with complex tasks, it introduces a new set of challenges and considerations. It requires the designer to have a clear understanding of the task at hand and to be able to define the reward function precisely. In many real-world scenarios, this is not always possible or practical.
Recent advancements in AI—and the race to build agentic AI and AGI—has made learning world models a particularly hot topic. In the reward-free paradigm, the AI agent is not given any explicit rewards but instead explores the environment driven by its own intrinsic motivation. This motivation can be thought of as a form of curiosity, where the agent seeks out novel and informative experiences to learn about the world around it. By removing the constraints of a predefined reward function, the agent is free to discover and learn in a more open-ended and flexible manner.
The research by Rigter, Jiang, and Posner takes this idea a step further by introducing a new algorithm called WAKER (Weighted Acquisition of Knowledge across Environments for Robustness). WAKER enables the agent to build robust world models by actively seeking out environments where it predicts it will have the greatest errors or uncertainties because this is where it will learn the most .
In the context of reward-free exploration, the agent aims to build a robust world model by actively seeking out the most complex or uncertain environments. These environments are where the agent expects to have the highest errors or uncertainties in its current world model, indicating a significant opportunity for learning and improvement. By focusing its exploration on these informative environments, the agent can efficiently gather the data needed to refine and expand its world model.
However, exploring complex and uncertain environments also comes with the risk of making suboptimal decisions. An algorithm called "minimax regret" addresses this by ensuring that the agent minimizes the maximum regret across all possible environments. Regret, in this context, measures the difference between the optimal performance and the agent's actual performance in a given environment.
By minimizing the maximum regret, the agent effectively balances the trade-off between exploration and exploitation. It ensures that the agent's decisions are robust and near-optimal across a wide range of environments, even in the face of uncertainty. This approach allows the agent to learn from its mistakes and continuously improve its decision-making abilities. In essence, the agent learns to "imagine" the outcomes of its actions and can use this knowledge to plan and make decisions in new situations.
The combination of seeking out complex and uncertain environments while minimizing the maximum regret enables the agent to build a comprehensive and reliable world model. This robust world model can then be used to train policies for various tasks, ensuring strong performance and adaptability in the face of new challenges. The minimax regret algorithm is thus a crucial component in the pursuit of open-ended, curiosity-driven AI that can autonomously learn and adapt to the complexities of the real world.
This research caught my eye because it has potentially important implications for human-AI interaction and creativity. An AI that is "curious" about the user will be more useful and potentially a more creative partner. By leveraging its robust world model, the agent could suggest novel ideas, challenge assumptions, and help the user explore their creative process more widely.
Curiosity also enables an intelligence that is good at finding its own relevant data rather than relying on vast amounts of supplied data. Agents that are intrinsically motivated to explore and learn about the world around them will inevitably be more data-efficient. Such reward-agnostic agents could then be applied to a wide range of domains. And this idea of "open-endedness" leads to a quite different AI world.
Open-endedness is the ability of an AI to continuously evolve, learn, and adapt to new situations without being constrained by predefined objectives or limitations. It is crucial for creating systems that can handle the complexity and unpredictability of real-world environments. Truly intelligent and adaptive AI will require open-endedness in AI systems—AI that can explore its environment, discover novel solutions, and continuously expand its knowledge and capabilities. This research implicitly includes open-endedness as a design feature by virtue of its reward-free nature.
The research by Rigter, Jiang, and Posner represents an important shift, moving us closer to creating machines that not only learn but are intrinsically motivated to explore and understand the world around them. This approach opens up new possibilities for AI to become a more adaptive, creative, and efficient partner in various fields. By embracing the essence of curiosity and the pursuit of open-endedness, we can develop AI systems that navigate the complexities of the real world with more autonomy and insight.
The Artificiality Weekend Briefing: About AI, Not Written by AI