AI Agents, Mathematics, and Making Sense of Chaos
From Artificiality This Week * Our Gathering: Our Artificiality Summit 2025 will be held on October 23-25 in Bend, Oregon. The
By enabling different AI models to 'speak' to each other and combine their strengths, CALM opens up new possibilities for solving complex problems across various domains and tackling tasks with expertise and precision, in a data and compute efficient way.
The ability to translate complex, technical concepts into accessible, non-technical language for a range of audiences is an important human skill. Imagine two experts from vastly different fields—say, a technical designer and a marketing professional—coming together to co-create a product presentation for a non-expert audience. Their challenge is not just to share knowledge. To be an effective team they must develop a shared language that connects their disparate domains.
This collaborative process of translation and adaptation mirrors a new approach put forward by Google DeepMind. By augmenting large language models with each other, Composition to Augment Language Models (CALM) effectively enables these AI 'experts' to develop a common language, enhancing their ability to tackle complex tasks through collaborative intelligence.
In a business setting, this could revolutionize how companies interact with data. For instance, a retail giant could use CALM to blend a model specialized in analyzing customer behavior patterns with another that excels in generating engaging product descriptions. This would create an AI system capable of not only identifying emerging market trends but also creating compelling marketing content tailored to those trends, all in real-time.
This fusion of technical analysis and creative communication could transform data-driven strategies because the models could seamlessly match traditional predictive AI systems with generative systems to create narratives in real-time that are able to be configured for different audiences, who use different terminologies and have different mental models underpinning their decision making.
CALM combines a general-purpose anchor model with specialized augmenting models to create new capabilities not achievable by either model individually. For example, combining the code understanding abilities of one model with the language generation skill of another to facilitate code-to-text generation.
CALM doesn't require updating of individual models—it learns a dense interaction between the models instead. CALM's design is intended to be simple and efficient: it requires only a minimal amount of additional data representing the combined capabilities of the models involved, which means it conserves resources while maintaining the integrity and strengths of each individual model. This feature of the design is important and consistent with Google's long-term development approach, which places a premium on the pursuit of ever-more efficient compute.
Technically, CALM introduces a minimal number of trainable parameters to work with the intermediate layers of both the anchor and augmenting models. This method enables a more effective integration, allowing new, complex tasks to be performed that neither model could achieve independently. Think of it as adding a small interpreter between two experts who speak different languages, enabling them to collaborate on solving a problem without either of them having to learn the other's language from scratch.
Cross-attention is a mechanism that allows one model for "attend" to the information processed by another model. This helps develop a deeper level of interaction and knowledge exchange between them. Here's how CALM does this using cross-attention:
For instance, imagine you have one AI model trained to understand complicated legal documents and another that excels in plain language summarization. By applying the cross-attention mechanism, the summarization model can focus on specific, relevant pieces of understanding from the legal model. This complementarity and back-and-forth processing allows for the creation of summaries that are both accurate in legal terms and accessible to non-experts.
This framework is also effective in scenarios requiring access to multiple types of specialized knowledge stored in different models. For instance, a foundational LLM could be augmented with models containing proprietary data or expertise, enhancing its capabilities in areas like reasoning, world knowledge, and language generation in specific domains. The technique also allows for reuse of existing models with existing capabilities which, in turn, allows for better control, the ability to avoid catastrophic forgetting, and flexibility across organizational boundaries.
By enabling different AI models to 'speak' to each other and combine their strengths, CALM opens up new possibilities for solving complex problems across various domains. This method reveals a practical approach for building AIs that can tackle broader arrays of tasks with expertise and precision, in a data and compute efficient way.
The Artificiality Weekend Briefing: About AI, Not Written by AI