Is AI Really as Creative as Humans?

Key Points:

Recent headlines claim that AI is more creative than humans, particularly in divergent thinking tasks such as the Alternative Uses Task (AUT), Consequences Task (CT), and Divergent Associations Test (DAT).
These tasks are language-based, measuring semantic distance between words, which AI language models excel at due to their inherent design.
However, these tests only measure one aspect of creativity, referred to as "creative potential." Creativity also involves the usefulness and appropriateness of ideas, which are difficult to standardize.
Examples from materials science and workshops on creativity and generative AI show that AI-generated ideas often lack practicality and appeal, despite being technically "creative."
Human creativity is enhanced by real-world, embodied experiences that are sensory and social, allowing for a better understanding of the micro-causal structure of the world.
The future of amplifying human creativity with AI goes beyond simply having AI generate lists while humans sift through them. It involves designing tools that allow for increased exploration, playfulness, and guidance during the AI's thinking process.

Recently, several papers have been published with headlines claiming that AI is more creative than humans. Personally, these headlines make me anxious. I deeply value creativity as an expression of individuality and humanity. My bias is to view our entire human endeavor as centered on creativity—not just in art, music, and traditionally creative professions, but also in how we advance knowledge and solve complex problems.

Human and AI neck and neck with GPT4 a nose ahead. From https://arxiv.org/pdf/2405.13012

So, my heart sinks a little each time I read another headline suggesting that GPT-4 surpasses all but the most creative humans. Is AI actually more creative? On key measures of creativity, specifically divergent thinking tasks, it seems so. But we shouldn't be surprised by this finding. Divergent creativity tasks like the Alternative Uses Task (AUT), the Consequences Task (CT), and the Divergent Associations Test (DAT) are fundamentally language-based. Therefore, it's not surprising that large language models excel at these tasks, which measure the semantic distance between words generated in the test.

For example, the AUT asks participants (or AI) to generate alternative uses for common objects, such as a fork. The CT asks for potential consequences of hypothetical scenarios, like "What if humans no longer needed to sleep?" The DAT challenges participants to come up with words that are as different as possible from each other. Creativity in these tasks is measured by fluency, originality, and elaboration—all of which are language-driven metrics.

One of the advantages of AI is its ability to precisely measure these aspects. Semantic distance between words or concepts is no longer solely a human judgment. Language models inherently run on this principle, making related metrics much easier to obtain. This is similar to any measurement bias in that once we can measure something accurately, we start to uncover more about it.

Uses for a fork and a rope. AI beats human. From https://www.nature.com/articles/s41598-024-53303-w

So, AI's creativity in these tasks is clear. But does this mean humans are out of the creative equation? Not at all, and there are many reasons why not.

First, these tests measure only one aspect of creativity. In a recent paper, researchers referred to this as AI's "creative potential," noting that GPT-4 surpassed humans by a wide margin, particularly in terms of originality—comparable to the top 1% of human responses. While this is impressive, creativity isn't just about originality. It's also about the usefulness and appropriateness of an idea or product, which are difficult to standardize because they depend on culture, history, and various other factors. Measuring word distances doesn't account for these nuances.

In this broader context, GPT-4's ideas could be "vastly inferior" to those of humans. We need more research across different creative domains to determine if AI is genuinely more creative, especially in assessing the usefulness and desirability of ideas.

This point is driven home by recent work in materials science. Domain experts scrutinized Google AI researchers' claims of "discovering" 400,000 new materials, only to reveal the reality that many of these chemicals already existed in databases or were theoretically impossible. The materials scientists were quick to call out the AI researchers' overconfidence, stressing the importance of humility and rigorous validation in scientific discovery.

This brings us to our final point concerning the claim of “an order-of-magnitude expansion in stable materials known to humanity”. We would respectfully suggest that the work by Merchant et al. does not report any new materials but reports a list of proposed compounds. In our view, a compound can be called a material when it exhibits some functionality and, therefore, has potential utility. — Anthony K. Cheetham and Ram Seshadri

We see similar patterns in our workshops on creativity and generative AI. In these sessions, we set up a competition between participants and AI to generate as many ideas as possible for a cookie aimed at millennials who care about sustainability. ChatGPT churns out lists of "ideas" far more quickly than any human. But most of these ideas are laughably uninspired, merely pastiches of "cookie," "sustainability," and "millennial." The results? Cookies featuring various combinations of recycled items (like coffee grounds) and trendy ingredients (like avocado and chia seeds). While there are occasional flashes of creativity, most ideas are just random mixes. AI excels at exploring the combinatorial space and blending elements but lacks any sense of what people might actually want in a cookie. Sure, an avocado, chocolate chip, upcycled coffee ground, and zucchini cookie might technically be "creative," but it's hardly appealing.

The importance of relevance—usefulness and desirability— does suggest we can extend the space collaboration far more than we might intuit it today. We can leverage AI to explore the intangible and ungraspable realms of creativity, and then translate those ideas into practical, real-world applications. How do humans achieve this? Through our real-world, embodied experiences that are sensory and social, allowing us to learn the micro-causal structure of the world. These research papers also show that our thinking is more flexible compared to AI. We do more than pulling up semantic associations by drawing ideas and concepts from where we store much of our information—that is, from the rest of the world. Context, as always, is everything.

This is particularly interesting when you consider other creative tasks. It suggests that amplifying human creativity with AI tools goes beyond simply having AI generate lists while humans sift through them. Instead, it points towards a future design paradigm where AI allows for increased exploration, playfulness, and guidance during its thinking process. Currently, it's still challenging to communicate our intentions clearly to AI. Some tools are better than others—language models can be guided more effectively than those used for scientific discovery or image generation. However, guiding machines toward producing what we want, rather than just spitting out lists that require human sanity checks, remains a real skill.

We need designers to work their magic here—incorporating more twists and turns, enabling wider exploration, providing for nudges for the imagination, and allowing longer time horizons for queries. We've barely scratched the surface of what new interfaces could be.

Blaise Agüera y Arcas: What Is Intelligence?

Blaise Agüera y Arcas and Michael Levin: The Computational Foundations of Life and Intelligence

Maggie Jackson: Embracing Uncertainty

Is AI Really as Creative as Humans?

Key Points:

Helen Edwards