AI That Thinks Fast and Slow

At a glance:

It is important to improve LLMs' attention strategies to become more context aware and distinguish spurious vs salient patterns.
A new paper from Meta researchers takes inspiration from theories of fast vs slow conceptualizations of human cognition. Irrelevant context increases incorrect answers in LLMs by upweighting token probabilities.
Researchers propose a "System 2 Attention" (S2A) technique for models to deliberately regenerate relevant context to focus on.
S2A mimics goal-driven, effortful top-down human attention—removes distraction, allows control. S2A has models regenerate stripped down context, then answer questions using that rather than original context.
Experiments show S2A improves factuality, objectivity, and math accuracy by screening out unnecessary information.
LLMs improve when introducing a deliberative, selective attention mechanism to overcome issues with standard soft attention through regenerating focused context.

Reasoning and planning is somewhat a holy grail in AI, and required for higher-level intelligence, sensemaking, and decision making under uncertainty. An important interim step on the route to a general intelligence is to refine a machine’s attention strategies so that AIs are more context aware, hallucinate less, and are better able to distinguish spurious correlations from salient patterns in text. Think of this as being able to direct your attention in such a way as to distinguish what matters about the world from what merely appears in front of you. You see trees but what matters is that there is a forest fire coming over the ridge.

Because AI excels at detecting correlative patterns more than it does at causation and meaning, AI researchers are constantly working to find incremental ways to improve AI’s ability to discern what matters. In a recent paper released by Meta, researchers develop a dual-mechanism for attention that functions in an analogous way to Daniel Kahneman’s System 1 and System 2 cognition: that is, fast and intuitive and usually “good enough” cognition versus slow, deliberate, and less error prone. In addition to leveraging Kahneman, the researchers also riff off of Steven Sloman’s work on directed attention for situations when deliberate attention is required in order to reduce errors.

The research underscores an evolving view of how to optimize AI systems. Rather than relying solely on better data or model architecture, we can prompt models to replicate pragmatic human thought processes like focused, deliberate attention, as mirrors of intelligence.

We asked Sloman how cognitive science and AI inform each other. He is skeptical about cognitive science's contribution to AI beyond initial inspiration. However, he sees great opportunity for AI research to provide deeper insight into human cognition.

While cognitive science helped spark ideas for early AI, Sloman believes insights from cutting-edge AI could greatly further our understanding of the human mind. By analyzing the mechanisms and representations learned by advanced neural networks, we may unlock new revelations about our own mental processes. Though the connection was mainly one-way historically, with cognition aiding AI, Sloman argues the tide has turned. "There are a lot of parallels between what is going on in AI and what is going on in cognitive science," he says. "I actually think that the development of the large language models is by far the biggest source of insight into cognitive process that we've seen in decades."

Sloman has been particularly surprised by large language models' ability to reason causally and counterfactually. This capacity was previously associated with human reasoning, not machines. Sloman's work on reasoning states people reason in two ways: one based on memory (intuitive, fast, "system 1," pattern completion) and one based on deliberation (analytical, slow, "system 2," formal inference rules). This dichotomy enables Sloman to analyze how large language models might reason. As they have ingested humanity's digitized knowledge, large language models possess immense "memory" and can reason well intuitively. However, their capacity for causal and counterfactual reasoning, deriving non-memory-based conclusions, is striking. "The fact the machines can now [reason these ways] without having those deliberative skills is kind of shocking," Sloman says.

And yet LLMs still make dumb mistakes. Standard “soft” attention mechanisms still incorporate irrelevant or opinionated information which can skew outputs. Soft attention is an automatic process, ingrained in models. It assigns probability across all context based on learned distributions and correlations, without any mechanism for deliberate control. This can incorporate distracting or irrelevant information that skews reasoning. Irrelevant information in a prompt will increase the likelihood of errors.

Absent conscious attention, is there a way to give AI a System 2 attention mechanism so that the AI knows when to slow down and “think” more?

A technique the researchers call “System 2 Attention” allows models to deliberately regenerate only appropriate context to focus on before generating a response. This mimics goal-driven top-down attention in humans. It is effortful and slower, but removes distraction akin to how we consciously direct focus while ignoring irrelevant stimuli when needed.

Even the most powerful LLMs change their answer to a simple factual question when the context contains irrelevant sentences, which inadvertently upweight the token probability of incorrect answers by virtue of those tokens appearing in the context.

S2A introduces a deliberative attention process by prompting the model to regenerate only the relevant context to focus on. This is slower but careful, mimicking purposeful System 2 thinking when we know System 1 could make mistakes.

*An LLM’s responses are adversely affected by spurious correlations in the context. Irrelevant facts about Sunnyvale (right) impact the LLM’s answers to the question about Sam Liccardo’s birth.*

The S2A process works by first regenerating the given context (x) to create a new context (x′) that excludes irrelevant parts. This step ensures that the context considered for generating a response contains only relevant material, minimizing the influence of any distracting or unrelated information. Then once the context has been regenerated, the LLM uses this refined context (x′) instead of the original one to produce the final response. This step ensures that the response is more focused, relevant, and less prone to errors or biases that might arise from irrelevant parts of the initial context.

So in a nutshell:

Take original (x)
Regenerate as x' stripping away unnecessary content via S2A(x)
Post-process x' if needed
Prompt LLM with x' + question
LLM produces final answer attending only to relevant context x'

The key aspects are leveraging the LLM’s own reasoning capacity to deliberately regenerate a more focused context, rather than just passing the original context straight through. This provides a form of controlled, goal-driven attention comparable to purposeful human thinking. Injecting S2A prompts is like consciously triggering deeper thought—decide what context is really relevant right now. It adds computational expense, but reduces mistakes.

In tests, models with S2A increased factual accuracy on questions with distracting opinions from 62.8% to 80.3%. Arguments stayed high quality but objectivity improved by 57.4% despite interjected sentiment. Accuracy on math word problems rose from 51.7% to 61.3% when ignoring irrelevant sentences.

A distracting sentence (“Max has 1000 more books than Mary”) makes LLaMA-2-70B-chat (left) make a mistake. System 2 Attention regenerates the portion of the context it decides to pay attention to, successfully removing the distracting sentence (right), then hence answering correctly.

It’s as if S2A puts neural “horse blinders” on models, narrowing their attention. Like using music to block out that infuriating racket from leaf blowers when you need to concentrate. Or intentionally not looking at your phone during dinner despite being tempted to.

S2A is one step toward models that know when to focus and think slow.

Blaise Aguera y Arcas and Michael Levin: The Computational Foundations of Life and Intelligence

Maggie Jackson: Embracing Uncertainty

Greg Epstein: Tech Agnostic

AI That Thinks Fast and Slow

Helen Edwards