The Design Illusion of LLMs

It’s easy to fall prey to the design illusion that because LLMs look sleek, they must be well-designed. But aesthetics alone do not equal design. As Steve Jobs once said, “Design is not just what it looks and feels like. Design is how it works.”

Aesthetics does not equal design...but aesthetics can mask the absence of design

Listen to an NPR-style discussion of this article, created by Google NotebookLLM.

audio-thumbnail
Design Illusion of LLMs
0:00
/437.992

Despite impressive capabilities, there is a disconnect between what’s expected and what’s actually delivered by large language models. This gap is largely due to a fundamental design illusion—one that makes these technologies look polished and ready for prime time but hides the real complexity and shortcomings underneath—and two category errors that have led these products to be designed in the wrong way.

💡
This article is part of a larger presentation on AI & design. Get in touch if you're interested in us presenting more of our research to your organization.

The Design Illusion: Looks Aren’t Everything

It’s easy to fall prey to the design illusion that because LLMs look sleek, they must be well-designed. But aesthetics alone do not equal design. As Steve Jobs once said, “Design is not just what it looks and feels like. Design is how it works.”

"Design is not just what it looks and feels like. Design is how it works." --Steve Jobs

For instance, this pencil is, at first, clearly stating its message, “Too Cool to Do Drugs.”

An image of a pencil that says "too cool to do drugs"

But as you use and sharpen the pencil, you can’t miss its poor design as the message is shortened to “Do Drugs.” It looks good on the surface but doesn't function well.

An image of four pencils, each shorter than the next. The first says "too cool to do drugs" and the last only says "drugs"

Generative AI models like ChatGPT and Claude may seem polished. They have clean interfaces and aesthetically pleasing designs, but that doesn’t mean they’re optimized for the user experience. True design isn't about making something look good—it's about ensuring it works seamlessly, and in the case of LLMs, that it helps people solve problems in an intuitive way.

LLMs only appear intuitive—and that’s due to two category errors.

Consumer vs Professional

The first category error is mistaking a professional product for a consumer product. Consumer products should require little to no learning; they are intuitive, like many apps on your phone. Professional products, however, need expertise and training, because they aren’t immediately intuitive. LLMs are being designed, presented, and marketed like consumer products, even though they function more like professional ones. They require a steep learning curve, from understanding how the models work to how to best prompt in any particular use case.

Rather than designing LLMs to signify this required learning curve, these tools have been designed with the illusion that they operate like other well-understood tools like search engines. This disconnect leaves users confused, thinking these tools should be simple to use when, in fact, they require substantial expertise.

"Do your customers understand how this works?" --Steve Jobs

Each time OpenAI or Anthropic drops a new feature, I hear Steve Jobs asking “do your customers know how this works?” With ChatGPT and Claude, the answer is almost universally “no” because it isn’t intuitive—users naturally think these are simple, consumer tools while there is important functionality hidden underneath that they don’t understand.

It’s like they see this simple light switch, thinking it’s like every other light switch they’ve seen.

An image of a light switch

But it obviously is more complex than the initial presentation.

An image of a light switch with a piece of paper above that says "Please do not turn off the light switch. It also operates the elevator. Thank you."

Windows vs Rooms

The second category error is designing LLMs like windows rather than rooms.

Windows are tools that provide a view into something, often something “on the other side” of the window. Think of Google helping you find other websites, eBay helping you find products, AirBnB helping you find a property to rent, or Spotify helping you find a new song. These windows help you navigate through their interface to find something you likely already understand.

A set of screenshots of "window" tools like netscape navigator, eBay, airbnb, etc.

Rooms are spaces where you work, collaborate, and create—like Photoshop, Slack, or Tableau. These are environments where the user stays and interacts extensively. A user might bring information into the room but the objective is to stay in the room—exploring, learning, and creating.

A set of screenshots of "room" tools like slack, notion, photoshop, and tableau

Generative AI tools like ChatGPT and Claude are designed as if they are windows, but in reality, they are rooms. The simplicity of the prompt field is an echo of searching in Google or eBay where the tool is helping a user find something “out there.” But, in reality, its search is more like searching in Slack or Notion, finding something “in here.”

These tools can also be easily mistaken for windows because they lack the needed functionality of rooms. They should be environments where users create, explore, and experiment, but the current design doesn’t support this. For example, their search capabilities remain rudimentary and there is no way to combine or edit text to actually create something. ChatGPT’s memory function is potentially a powerful room-like function—allowing the tool to improve as the user works in the room more. But, the reality is seriously lacking since how memory functions is opaque and user controls are buried.

A screenshot or chatgpt with a question: window or room?

Designing consumer products based on step changes in technology is hard. I was fortunate to work though one of these transitions 20 years ago as we designed new consumer products that connected to external devices. Apple’s iApps broke new ground, providing intuitive designs for consumers to connect to a photo camera through iPhoto, a video camera through iMovie, and, of course to the iPod through iTunes. These tools created experiences with media that were novel for most people. Creating a seamless experience that “just worked” was devilishly difficult.

Designing for generative AI is entirely new and presents a shift in design difficulty we haven’t experienced in 40 years when we shifted to GUI interfaces. Today, we are shifting to designing systems that are capable of conversation, intelligence, and handling unpredictability—traits we’ve never had to design for before. This shift from command-based interfaces to intent-based interfaces opens an entirely new opportunity that is both exciting and daunting.

What does it mean to design for intelligence, conversations, unpredictability, etc? We don't know yet, but it isn't this (and a screenshot of ChatGPT)

Unsurprisingly, the companies building these products haven’t figured it all out yet. Perhaps admitting this gap is the key—and making sure to present LLMs as the complex, professional rooms that they are, not the simple, consumer windows that companies hope will increase new user growth and real-world impact.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Artificiality.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.