Buckle Up: Big Product Changes Ahead

This Week:

Buckle Up: Big Product Changes Ahead (read on)
Gemini 1.5 Pro: An Ultra-Efficient, Multimodal System
Craft Better Prompts: Sharpening Your Intuitions for More Advanced Prompting
Richard Kerris of NVIDIA: AI, Creators, and Developers
Can AI Help You Expand Your Talents and Find Flow?
Tyler Marghetis: The Leaps of Human Imagination
Eye on Apple

Just when you might have thought that the pace of new generative AI technology might slow, Google and OpenAI released product previews that represent material leaps. Both announcements were showcases of technologies that are only available to limited audiences, leaving us with the feeling that both companies are posturing to ward off competition and/or build investor interest in OpenAI’s current fundraising and Google’s lagging stock price. While OpenAI’s release is clearly remarkable, we’re more focused on the fundamental changes behind Google’s new version of Gemini which, we think, should quiet the narrative that Google has been left behind.

OpenAI Sora

OpenAI released a preview of Sora, a new text-to-video technology. To date, OpenAI’s image generation technology has been somewhat underwhelming vs. competition from MidJourney and Stable Diffusion. Sora, however, is a major leap forward in terms of image quality and video duration. Currently, AI video generation presents exciting potential, but the reality is low-quality images, frequent errors, and videos that are limited to only a few seconds. The Sora demo shows videos up to a minute long with high-quality images and complex compositions like reflections. For instance, take a look at the reflections in the water, sunglasses, and even earrings in this video of a woman walking on a Tokyo street.

0:00

/0:59

Sora isn’t without errors, however. OpenAI shared videos with objects morphing and spontaneously appearing, distorted body parts, and implausible movements. Even in the showcase examples, you can find errors if you look closely enough. For instance, watch people spontaneously disappear in this video of people walking along a snowy street in Tokyo.

0:00

/0:17

Despite the errors, Sora represents a significant improvement over current technology. And, in somewhat unusual form, OpenAI is not releasing Sora due to safety concerns—instead choosing to work with experts on misinformation, hateful content, and bias for now. The company also says that it will be embedding metadata so that Sora-generated videos will be easy to identify.

Related to AI & creativity, make sure to check out our interview with Richard Kerris, VP Developer Relations and GM of Media & Entertainment at NVIDIA about the impact of AI that he anticipates in the creative industries. And, take a look at our coverage of recent Apple research presenting a unique approach to vision modeling that things at Apple’s likely strategic imperative towards heavily integrating vision models in spatial computing environments.

Google Gemini 1.5

The second major product preview this week was Gemini 1.5 from Google. Yes, Gemini was announced in December and fully released about a week ago. Perhaps determined not to look behind again, Google is already shifting focus to the next version by releasing research about and developer access to Gemini 1.5.

The two most important parts of Gemini 1.5 appear to be:

A context window of 1 million tokens. This represents about 700,000 words and roughly 5x the next largest context window available. Context is important because it represents the quantity of information that can be accessed by the model directly related to a user conversation. We like to think about this as the model’s memory. Do you want the model to remember a conversation from a week ago? Or do you want it to be able to look at lots of your emails all at once? Context matters. Google says that Gemini’s context can get to 10 million tokens. Using an average of 100 words per message, that’s equivalent to roughly 70,000 emails, or roughly every message I have sent or received in the past 10 years.
A shift to a mixture-of-experts architecture. As we write in Gemini 1.5 Pro: An Ultra-Efficient, Multimodal System: “An MoE model allows a computer to handle a vast amount of information very efficiently. Mixture of Experts (MoE) are like having a huge library of knowledge (a large number of parameters) but with a smart system that knows exactly which "books" (or parts of the model) to consult for any given question.” While Gemini isn’t the only MoE model (the most prominent alternative is likely Mixtral), Gemini 1.5’s performance is beyond anything else in the market today.

The speed with which generative AI vendors are updating products is beyond anything we’ve seen in previous technology shifts. And the changes don’t seem to be slowing down. What to do? Plan on a continuous learning journey. Unlike pretty much any other technology, an individual can’t learn about generative AI and expect that learning to be current for long. Adopting a continuous learning journey seems to be the only way to keep pace and anticipate the future.

We have structured Artificiality Pro as a continuous learning journey specifically for this reason. Get in touch if you’d like to learn more about our upcoming research releases on AI Trust, AI-Enhanced Learning, and Human-Centered AI Design.

This Week from Artificiality

Gemini 1.5 Pro: An Ultra-Efficient, Multimodal System

The introduction of Gemini 1.5 Pro's ability to handle unprecedented context lengths, its superior performance compared to its predecessors, and the sustained relevance of power laws in its design underscore the breadth and depth of Google's long term capabilities.

Craft Better Prompts: Sharpening Your Intuitions for More Advanced Prompting

By understanding the principles behind the evolving field of prompt engineering, we can craft better queries and engage more effectively with AI. They're insights we can all use to sharpen our own interactions with AI, even if we're not writing the code ourselves.

Richard Kerris of NVIDIA: AI, Creators, and Developers

An interview with Richard Kerris, Vice President of Developer Relations and GM of Media & Entertainment at NVIDIA, about AI, creators, and developers.

Can AI Help You Expand Your Talents and Find Flow?

It appears that there is one effect many researchers are finding across multiple fields: generative AI has a significant impact on lower skilled and less experienced people. However, if we automate difficult tasks we cut ourselves off from the essential components for achieving mastery like flow.

Tyler Marghetis: The Leaps of Human Imagination

An interview with about the lulls and leaps of human imagination with Tyler Marghetis, Assistant Professor of Cognitive & Information Sciences at the University of California, Merced.

Eye on Apple

Apple researchers recently published a paper describing a new architecture for vision models. The paper's unique approach to vision modeling hints at Apple's likely strategic imperative towards heavily integrating vision models in spatial computing environments.

Blaise Agüera y Arcas: What Is Intelligence?

Blaise Agüera y Arcas and Michael Levin: The Computational Foundations of Life and Intelligence

Maggie Jackson: Embracing Uncertainty

Dave Edwards