AIs That Understand Other AIs

Key Points:

Collaborative Intelligence Through CALM: Google DeepMind’s Composition to Augment Language Models (CALM) enables multiple AI models to collaborate and develop a shared language, enhancing their ability to tackle complex tasks through collaborative intelligence.
Transforming Business Interactions: CALM can revolutionize business operations by combining specialized models, such as those analyzing customer behavior and generating product descriptions, to create integrated systems capable of identifying trends and crafting tailored marketing content in real-time.
Enhanced Capabilities: CALM combines a general-purpose anchor model with specialized augmenting models to perform tasks neither could achieve alone, such as code-to-text generation, by blending code understanding and language generation skills.
Efficiency and Resource Conservation: CALM requires minimal additional data to represent combined model capabilities, conserving resources while maintaining the strengths of individual models. This aligns with Google’s focus on efficient compute.
Technical Integration with Cross-Attention: CALM uses cross-attention mechanisms, allowing one model to focus on another’s information. This enables dynamic information exchange, creating a more powerful combined capability without updating individual models.
Real-World Applications: CALM can enhance tasks like summarizing complex legal documents into plain language, leveraging the strengths of different specialized models to create accurate and accessible outputs.
Reuse and Flexibility: CALM allows for the reuse of existing models, avoiding catastrophic forgetting and providing flexibility across organizational boundaries, enhancing control and integration of proprietary data or expertise.
Broader Implications for AI Development: The CALM framework opens new possibilities for AI, allowing models to ‘speak’ to each other, combine their strengths, and address a wider range of tasks with expertise and precision.

The ability to translate complex, technical concepts into accessible, non-technical language for a range of audiences is an important human skill. Imagine two experts from vastly different fields—say, a technical designer and a marketing professional—coming together to co-create a product presentation for a non-expert audience. Their challenge is not just to share knowledge. To be an effective team they must develop a shared language that connects their disparate domains.

This collaborative process of translation and adaptation mirrors a new approach put forward by Google DeepMind. By augmenting large language models with each other, Composition to Augment Language Models (CALM) effectively enables these AI 'experts' to develop a common language, enhancing their ability to tackle complex tasks through collaborative intelligence.

In a business setting, this could revolutionize how companies interact with data. For instance, a retail giant could use CALM to blend a model specialized in analyzing customer behavior patterns with another that excels in generating engaging product descriptions. This would create an AI system capable of not only identifying emerging market trends but also creating compelling marketing content tailored to those trends, all in real-time.

This fusion of technical analysis and creative communication could transform data-driven strategies because the models could seamlessly match traditional predictive AI systems with generative systems to create narratives in real-time that are able to be configured for different audiences, who use different terminologies and have different mental models underpinning their decision making.

CALM combines a general-purpose anchor model with specialized augmenting models to create new capabilities not achievable by either model individually. For example, combining the code understanding abilities of one model with the language generation skill of another to facilitate code-to-text generation.

CALM doesn't require updating of individual models—it learns a dense interaction between the models instead. CALM's design is intended to be simple and efficient: it requires only a minimal amount of additional data representing the combined capabilities of the models involved, which means it conserves resources while maintaining the integrity and strengths of each individual model. This feature of the design is important and consistent with Google's long-term development approach, which places a premium on the pursuit of ever-more efficient compute.

Technically, CALM introduces a minimal number of trainable parameters to work with the intermediate layers of both the anchor and augmenting models. This method enables a more effective integration, allowing new, complex tasks to be performed that neither model could achieve independently. Think of it as adding a small interpreter between two experts who speak different languages, enabling them to collaborate on solving a problem without either of them having to learn the other's language from scratch.

Cross-attention is a mechanism that allows one model for "attend" to the information processed by another model. This helps develop a deeper level of interaction and knowledge exchange between them. Here's how CALM does this using cross-attention:

Layer Representations: Imagine each AI model as a multi-layered structure, where each layer processes the input data in increasingly complex ways. The output of these layers (layer representations) can be thought of as the model's understanding of the input at various levels of abstraction.
Linear Transformations: Before one model can understand the layer representations of another, these representations may need to be adjusted or transformed. This is because the models might operate in slightly different "languages" internally, due to their specialized training. Linear transformations adjust the data from one model to match the "language" of the other, making it comprehensible.
Cross-Attention Layers: Once the data is compatible, the cross-attention layers allow one model to focus on specific parts of the other model's transformed layer representations. This is similar to one expert listening to another and being able to ask, "Can you tell me more about this specific point?" It enables a dynamic exchange of information where each model can leverage the specialized knowledge of the other, leading to a more powerful combined capability.

For instance, imagine you have one AI model trained to understand complicated legal documents and another that excels in plain language summarization. By applying the cross-attention mechanism, the summarization model can focus on specific, relevant pieces of understanding from the legal model. This complementarity and back-and-forth processing allows for the creation of summaries that are both accurate in legal terms and accessible to non-experts.

This framework is also effective in scenarios requiring access to multiple types of specialized knowledge stored in different models. For instance, a foundational LLM could be augmented with models containing proprietary data or expertise, enhancing its capabilities in areas like reasoning, world knowledge, and language generation in specific domains. The technique also allows for reuse of existing models with existing capabilities which, in turn, allows for better control, the ability to avoid catastrophic forgetting, and flexibility across organizational boundaries.

By enabling different AI models to 'speak' to each other and combine their strengths, CALM opens up new possibilities for solving complex problems across various domains. This method reveals a practical approach for building AIs that can tackle broader arrays of tasks with expertise and precision, in a data and compute efficient way.

Blaise Aguera y Arcas and Michael Levin: The Computational Foundations of Life and Intelligence

Maggie Jackson: Embracing Uncertainty

Greg Epstein: Tech Agnostic

AIs That Understand Other AIs

Key Points:

Helen Edwards