AI Agents, Mathematics, and Making Sense of Chaos
From Artificiality This Week * Our Gathering: Our Artificiality Summit 2025 will be held on October 23-25 in Bend, Oregon. The
Of all the interesting parts of Google’s Gemini announcement, one is keeping me up at night wondering about the possibilities for the future: dynamic coding.
Of all the interesting parts of Google’s Gemini announcement, one is keeping me up at night wondering about the possibilities for the future: dynamic coding.
Go watch this video and then come back. I'll wait—what I have to say will mean a lot more if you watch it first.
In this video, you can see how Gemini creates a bespoke interface through multiple steps: reasoning about what a user desires, clarifying ambiguity, and creating a novel UX. This is all without any specific human coding—a new capability that I believe provides a peek into a new, dynamic future with machines.
Some commentators have criticized Google's Gemini announcement because some videos showed aspirational concepts for the future that are well beyond current capabilities. They have also brushed off the video you just watched saying things like, "how useful is it to create a novel UX just to find a cupcake?"
While it's fair to be critical of a company exaggerating current capabilities, I think these commentators are missing the big picture. Back in the 1990s, when we were waiting for a static web page of text to load over our 14.4 or 28.8 baud modems, we would never have been able to predict a future of pinching and zooming on a full-glass screen iPhone to get a closer view of a streaming HD video. It's easy to dismiss today's capabilities just like some dismissed buying Beanie Babies on eBay—missing the moment that foretold a multi-trillion dollar e-commerce industry 30 years later. I believe we're at an analogous moment with dynamic design. Today's Gemini demo is just today—the important question is what might happen over the next decades.
Today’s software is generally responsive to a set of user choices. Interfaces respond to the size of a screen—desktop, tablet, phone—and to specific user commands—opening or closing panels, etc. All of these changes are programmed ahead of time within constraints that are understood, i.e., we know the various screen sizes, font options, etc. Gemini’s dynamic coding creates an entirely new opportunity—the opportunity for an AI, through chat, to discover what a user wants to accomplish and then dynamically create an interface-for-one with capabilities designed solely for that person to accomplish that task in that moment.
Given the dynamic back-and-forth of conversational interfaces, the AI and human can iteratively change this software interface in partnership along their task journey. Gemini opens an opportunity to leave behind today's static, faux-personalized interfaces for a new, temporal, and dynamic richness in our experience with intelligent machine companions.
In the demo, we see that Gemini reasons that the user might want to see images, given that he’s interested in birthday party ideas for his daughter. It also reasons that he might want to see different kinds of animals, given that his initial request was ambiguous about his daughter’s favorite animals. In response, it dynamically creates drop-down menus so that the user can choose the kind of animal. While I’m pretty impressed by the admittedly simple use case, let me explain why this makes my mind whir, where it might lead, and why it is groundbreaking.
Let's take something more challenging than the demo like editing a podcast. An AI might create a novel interface that fits our podcast format. It might start by giving me a combined audio and text display of the interview portion. It might then prompt me to record an intro based on a script it automatically creates. It might create an easy-to-use interface to stitch in music at the beginning and end. It might offer to take care of all the publishing tasks including sending and checking test emails. And none of this might be on a timeline-based interface like GarageBand since that’s far more than I require for this task. Nothing against GarageBand—it was created on my team at Apple—but it is designed to many things for many people, while the AI might create an interface that is designed for only me to do only my thing. The AI might also bring the task to me by prompting me to finish my sub-tasks where ever I am to stay on schedule rather than requiring me to go to the task by sitting at my computer.
A paper published this week by Microsoft researchers studying Microsoft products claims that "one of the most critical complementary innovations that will be needed [to] maximize productivity gains will be entirely reinvented workflows with new and rearranged tasks. Tools that are chat-centered (vs. app-centered) and agent-like technologies may mark the beginning of these new workflows". This is a rather bland way of saying "oh. my. god. searching is now exploring."
We've known for years—as we've written about many times—very few endeavors are undertaken in a linear fashion. Even finding inspiration for a child's birthday party is a complex mix of set tasks and peripatetic wandering within a goal-directed, conceptually governed process. Jobs-to-be-done aren't made up of tasks flying like water in an irrigation canal—laminar, predicable, and with constant speed. Jobs-to-be-done are turbulent—non-linear, variable, and the unpredictability of where we end up is what makes us human. Gemini is zero-day for what "chat-centered" might look like and what it shows is that generative AI should help us hit the sweet spot in discovery of the new versus just getting shit done.
All of this is speculative. And it’s impossible to accurately predict where this technology will go. But, as someone who worked on dozens of software products over multiple technology eras, this change feels like it will upend the way we conceptualize application design. Yes, there will likely be good reasons to continue designing software that is reactive to many users doing many things. But there will also be countless opportunities to design dynamic software that creates a novel workflow for one user’s individual task. And once AI builds an intimate understanding of each of us, that dynamism has the potential to personalize every interaction we have with our machines.
In this subscriber-only episode, we dive into the exciting announcement of Google's new foundation model for AI, Gemini, exploring three key aspects of this important, new technology: benchmarks, applications, and size and scale.
Given our interest in the idea of LLMs being able to help people make better decisions—which often requires more deliberative thinking—we asked Steve Sloman to come back on the podcast to get his reaction to new research from Meta that refers to one of his research papers from 1996. Yet again, we had a dynamic conversation about human cognition and modern AI, which field is learning what from the other, and a few speculations about the future. We’re grateful for Steve taking the time to talk with us again and hope that he’ll join us for a third time when his next book is released sometime in 2024.
Helen digs into a new paper from Meta researchers takes inspiration from theories of fast vs slow conceptualizations of human cognition and proposes a "System 2 Attention" (S2A) technique for models to deliberately regenerate relevant context to focus on. This is particularly good reading to understand the details behind our conversation with Steve Sloman.
The Artificiality Weekend Briefing: About AI, Not Written by AI