J. Craig Wheeler: The Path to Singularity
An interview with J. Craig Wheeler, Professor of astronomy of the University of Texas at Austin about his book, The Path to Singularity.
This 10-part series explores how Generative AI is transforming the future of work, from automation and augmentation to impacts on productivity, skills, emerging talents, and established leaders.
In today's rapidly evolving digital age, we are entering a new era – the Generative AI era. But what does this mean for the future of jobs and the workforce? Welcome to Decoding the Future, a ten-part series that delves into the profound impact of Generative AI on employment across industries, professions, and skill levels.
From the early days of manual labor to the Information Age, the way we work has continually been in flux. Now, Generative AI promises (or threatens) yet another shift. Instead of merely processing data or automating routine tasks, these AI systems generate new content, suggest novel ideas, and even engage in creative processes. They aren't just tools; they are collaborators.
In this series, we'll explore pressing questions, challenge popular myths and hype, and shed light on the nuances. How will Generative AI enhance human capabilities? How can we use first principles thinking to forecast which professions stand to gain, and which face obsolescence? How do we prepare for a world where machines don't just do as they're told, but contribute ideas and strategies of their own?
Moreover, as AI's capabilities grow, so does its potential to influence decision-making processes, strategic planning, and even areas we've historically deemed strictly human. Think art, literature, and even emotional understanding. As this technology becomes deeply interwoven into our professional lives, understanding its implications is not just beneficial—it's essential.
In this series, we'll journey through the transformative landscape of Generative AI and its profound effects on our working world. We dive deep into the details of automation, augmentation, jobs, tasks, skills, productivity, expertise, obsolescence, and why work matters.
We frame the big questions so that you can be smarter about this new breed of AI:
Join us as we frame the key issues and ideas for the future of work.
Generative AI is in the "extreme headline" stage. Studies and model-based research provide some scientific grounding to understanding its impact on jobs, but anecdotes and dubious productivity-centered studies muddy the waters.
OpenAI’s ChatGPT reopened the job automation debate. In March, the company released a paper titled “GPTs are GPTs: An early look at the labor market impact potential of large language models,” where researchers concluded that 80% of the US workforce could have at least 10% of their work tasks affected by the introduction of large language models. More recently, researchers at Princeton studied the exposure of professions to Generative AI, concluding that highly-educated, highly-paid, white-collar occupations may be most exposed.
The traditional approach in AI design has often hinged on the dichotomy of automation versus augmentation. Machines have typically been trusted with predictable, repeatable, high-volume tasks or calculations, while humans have taken on tasks demanding creativity, emotional intelligence, and the capacity to navigate complex, ambiguous situations.
Recent developments prompt us to reevaluate this labor division between humans and machines. We now know that large language models can be creative, can reason analogically, can think in metaphors, have personalities and theory of mind, and have cognitive empathy, at least to some degree. This means we need to rethink the automation versus augmentation divide, especially given that increasing capabilities from AI will drive more understanding into human capabilities.
As we’ve seen before, this complexity is often glossed over when AI researchers announce hitting a human-level benchmark in a specific measure of intelligence, only to be used to automate a human process. This may result in "so-so automation," where the machine, although competent, fails to match human performance in real-world scenarios. Consequently, humans are left to handle marginal yet important tasks, often viewed as tedious or mundane.
This trap of automation restricts humans to the narrow confines set by the automation design, preventing them from tackling what they excel at—navigating complexity, handling unpredictability, and making context-dependent judgments, decisions, and actions.
Even when we think about augmentation rather than automation, the approach to work is inherently biased, with a narrow vision that often overlooks the complexity, sociality, and interrelatedness of human work so the practical implementation often falls short. The intricacy of decisions and actions—woven together by predictions, judgments, and actions—operates within a multifaceted system where feedback loops exist between the system and human cognition, encompassing both conscious and subconscious processes. Much of this is grounded in experiential knowledge and isn't easily quantifiable or observable.
The deployment of technology doesn't simply occur; it's a deliberate choice. By breaking away from the rigid dichotomy of automation versus augmentation, we might discover a more versatile model for envisioning the future of work.
The name of the game in worker productivity isn't merely about increasing it, it's about boosting demand for the output and the skills that drive that output. But let's throw another economic concept into the mix—saturation. How does increased productivity play out when demand is fully met? If everyone in your workforce becomes 50% more productive across the board, it sounds like a dream, unless demand is sated. When that happens, the price for these skills takes a nosedive. This isn’t necessarily what we want.
The narrative is changing—what skills does AI complement, thus enhancing productivity? Who reaps these benefits? How flexible is the demand for these upgraded skills? According to economist David Autor, AI can be a strange bedfellow—it's fantastic for productivity, not so much for wealth distribution. It's a myopic and potentially misleading view to talk about productivity without bringing saturation, elasticity, and wealth distribution to the table.
A recent study from MIT provides some intriguing insights into the interaction between generative AI, such as ChatGPT, and knowledge work. The study found that ChatGPT significantly boosted average productivity, primarily by elevating the productivity of less experienced or lower-ability workers. This uplift brought them closer to the productivity levels of their more skilled counterparts.
This impact of generative AI changes the structure of tasks. Work becomes more focused on generating ideas and editing, with generative AI smoothing the path to starting a project. This effect is particularly beneficial for those less familiar with a given task.
However, while generative AI reduces the human effort required for a task, this doesn't necessarily translate into an increase in creativity or innovation. This suggests that while AI can enhance productivity, its impact on creative outputs may be more nuanced and complex.
The notion of AI turning a novice coder into a coding wizard isn't as straightforward as it might seem. Novice programmers can become a wild card, bringing a wealth of unknown unknowns to the table, and potentially introducing errors and complexity.
In contrast, an experienced coder isn't automatically faster with AI assistance, especially when they're navigating novel, unfamiliar territory using complex code. Ultimately, how AI is harnessed hinges on the individual's perception of the problem, the AI tool, and their own skills.
Ever since the first tool was wielded, humans have wrestled with technology’s double-edged sword: the fear of being usurped by our own creations. While we’re not quite unemployed en masse yet, it's naive to dismiss the angst that new technologies send through our job markets as mere growing pains that will be cured by retraining, resilience, or universal basic income. When it comes to people, the stakes are high. Our adaptability, though profound, is not instantaneous. We need time and resources to evolve alongside our tools.
Today’s Generative AI is an unprecedented technology. For the first time, we're not just delegating tasks to lifeless tools, but entrusting decisions to digital entities with their own semblance of “thought”. The quest for automation, at its core, has always prized decision making as the pinnacle of human utility. Now, that very endeavor threatens our claim to biological “peak cognition”.
Consider the two prevailing narratives for AI’s role in the world of work: automation and augmentation. Automation, in essence, moves work from humans to machines, giving machines duties once solely our domain. It shines in repetitive, mundane, or predictable tasks that are universally shirked by us. On the flip side, augmentation amplifies our abilities, making AI our intellectual sidekick rather than our replacement. It thrives when humans enjoy their role and AI can streamline the process, making it easier, faster, and better.
In the pre-generative AI era, we compartmentalized AI and human abilities neatly. We revered AI for tasks requiring unerring accuracy and repetitiveness, and retained our faith in human superiority for endeavors calling for creativity, empathy, and abstract reasoning.
However, as we'll see, these once-clear lines are blurring. The dawn of generative AI–and the AI evolution at large–reveals a startling truth: we barely understand our own intelligence. With generative AI we can even now imagine how machines can now make us more creative, caring, and connected. As burgeoning tools like advanced language models and image generators become standard in our creative, inventive, and communicative arsenal, we're forced to reevaluate our conception of human intellect and, by extension, our perception of work.
The traditional approach in AI design has often hinged on the dichotomy of automation versus augmentation. Machines have typically been trusted with predictable, repeatable, high-volume tasks or calculations, while humans have taken on tasks demanding creativity, emotional intelligence, and the capacity to navigate complex, ambiguous situations.
However, recent developments prompt us to reevaluate this labor division between humans and machines. On one hand, AI's growing potential in tasks requiring emotional awareness presents a compelling case for reducing bias and bolstering privacy.
On the other hand, as AI gets integrated into larger labor systems, a fascinating pattern emerges: one individual's automation morphs into another's augmentation. When a rare, scarce, or expert skill is automated, it suddenly opens up to a wider, less skilled demographic. While this may lead to a drop in wages for a select group of experts, it concurrently expands opportunities for many more. A classic example of this is the transformation of London's taxi industry: once dominated by a small number of drivers who had mastered the rigorous "The Knowledge" test, it's now flooded with Uber drivers tenfold the number of test passers, thanks to Google and Uber's automation efforts.
Generative AI introduces a twist to the tale. As a cultural technology, it democratizes access to human expertise in an unprecedented manner, blurring the boundary between 'human' and 'machine' through its creative, empathetic, and reasoning capabilities. Generative AI forces us to reconsider how much we truly understand about phenomena we label as human intelligence—creativity, empathy, reasoning, invention.
This gives rise to a crucial question: how should we approach the decision to complement or replace human labor in an era teeming with cognitive, creative, and cultural AI? If we automate an expert programmer’s skills with AI, are we democratizing the world of coding, potentially reducing the expert's wages but expanding the overall pie? Or are we instead amplifying their skills, turning programmers into a coding elite, and widening the divide between programmers and non-programmers?
Generative AI is coming for white collar jobs. Or is it? With generative AI we can even now imagine how machines can now make us more creative, caring, and connected. As burgeoning tools like advanced language models and image generators become standard in our creative, inventive, and communicative arsenal, we're forced to reevaluate our conception of human intellect and, by extension, our perception of work.
When considering the future of work it’s helpful to understand some key principles about human work.
Principle 1: The Interconnected Web of Work
The nature of human work isn't monolithic. A job is an intricate composition of numerous tasks. This view enables us to discern that some tasks are ripe for machine takeover, while others remain stubbornly human.
Automating routine tasks within a job is akin to refining raw ore: it removes the less valuable parts, leaving only the precious minerals. The mundane, repetitive tasks are filtered out, making room for the complex, high-value tasks to take center stage. This enrichment of the job role, in turn, increases its overall worth, especially if it stimulates the demand for the output.
Jobs, in this sense, are more fluid than fixed. They aren’t isolated islands in the vast ocean of work. Instead, they are dynamic, evolving entities, prone to fragmentation and fusion as technology and business models evolve. A job doesn't simply vanish into obsolescence; it unravels into its constituent tasks, which then become threads in new, unimagined work tapestries.
Principle 2: Prediction Machines: Increasing Value Through Prediction
A fundamental truth about AI is its role as a expert predictor. As we dissect work into tasks, it's crucial to identify those that hinge on prediction or could significantly improve with more affordable forecasts or inferences. The challenge lies in redefining our perception of prediction. We need to move beyond seeing it merely as a forecast of the future and recognize it as a tool for completing an incomplete picture. The art here lies in envisioning how predictions could elevate the value of a task.
As AI reduces the cost of prediction, it inadvertently enhances the value of judgment and action. Why? Because cheaper predictions open up a broader realm of possibilities for decisions and actions. The more we know about the possible outcomes, the more nuanced and strategic our judgments can become, and the more precise our actions can be. Therefore, AI's prediction prowess amplifies the value of human judgment and action.
Principle 3: Twins of Cognition: Prediction and Judgment
Peering into the workings of a human mind reveals a process far more complex than executing a single task step. An analyst crafting a financial model is simultaneously distilling wisdom from a sea of data. A doctor arriving at a diagnosis is concurrently gauging the potential repercussions on the patient's health and formulating a context-sensitive, personalized treatment plan. A programmer must weigh the importance of a slew of software tests based on the client's unique context.
The essence of human work is decision making. A decision is a context-laden, back-and-forth between prediction and judgment, often under broad uncertainty. Unconsciously, we fuse these two elements in our cognitive process—prediction, a process of filling gaps or foretelling the future, and judgment, a quality assessment of the decision or a confidence check. However, when machines pry prediction apart from judgment, the role of AI becomes more distinct, allowing us to discern the tasks that remain uniquely human.
While the effects of automation and AI in the workplace can often defy our intuition, and much of the discussion remains rooted in theoretical territory, recent studies on AI adoption allow us to sketch some broad conclusions and considerations for companies.
It turns out that human agency (and a desire to assert it) matters. People are complex and unpredictable and sometimes like to do things "just 'cause." Special circumstances can exacerbate the impact of human agency, especially when stakes are high, consequences have social costs or benefits, and where emotions impact how technology is used.
The influence of AI on productivity presents a convoluted puzzle, which Generative AI only complicates further. As it diffuses across areas of knowledge work, human creativity, and inventiveness, the dialogue around AI's impact on productivity becomes more complex and intricate.
Daron Acemoglu, a giant in the realm of labor and AI studies, suggests that the lack of productivity in developed economies may be explained by the unbalanced focus on automation. In this perspective, we often overlook the realities of automation that fall short of excellence. Automation, it seems, carries a bias in our society—a proclivity towards the capitalistic ideals of economic efficiency, with structural elements that favor capital over labor.
Indeed, this is the predominant narrative in technology circles: humans are portrayed as unpredictable, unreliable, complacent, troublesome, not uniform, uncontrollable, and imperfect. Reducing labor costs is not inherently wrong—after all, we all appreciate more affordable products. However, when the narrative strips away human involvement entirely and lacks intellectual honesty about the limitations of machines, we start to spin in circles, trapped within the narrow confines of job definitions and losing sight of potential avenues for innovation.
If we permit the narrative to evolve into one that exclusively emphasizes machine cognition, at the cost of our shared intelligence, we risk undermining our collective ability to tackle pressing issues. This very notion leans into a nuanced critique of capitalism and the tech giants.
The disparity between job loss and job creation is a tough concept for many to grapple with. It's relatively easy to foresee the jobs that might be eradicated, but it's considerably more challenging to envisage the ones that might spring up. We should shift our attention towards the issues that we are yet to solve—some of which are colossal, requiring global coordination and collective intelligence, of which machines can only tackle a part. If someone insists that their digital avatar interacting with your avatar will magically resolve all your shared issues, they are either profoundly misguided or blatantly deceptive.
But what about Universal Basic Income? If AI can unleash enormous productivity, can’t we just all retire and go to the beach?
The proposal for Universal Basic Income (UBI) is a distraction in the broader societal conversation we should be having about the future of work. It presents a simplistic solution to complex issues, and worse, it may potentially veil an embedded bias of technocratic and undemocratic views.
Commonsense tells us that the feasibility of implementing UBI in the US political climate is profoundly questionable. Our system routinely grapples with, and often fails to pass, policies that are significantly less radical than UBI. From universal healthcare to affordable higher education, we have seen substantial resistance to initiatives aimed at ensuring basic necessities for all. The idea of UBI, with its inherent requirement for dramatic wealth redistribution, is far more radical, and it's not unreasonable to expect even more intense opposition. Emphasizing UBI as the panacea for our social and economic issues distracts from more achievable goals that could make a meaningful impact on people's lives.
Moreover, the UBI discussion subtly undermines the inherent value of work, and this is where it inadvertently aligns with hyper-capitalistic and technocratic perspectives. UBI, with its implicit message that work is a burden to be avoided, fuels a narrative that could lead to increased mechanization and displacement of human labor. While it's crucial to challenge exploitative work practices and strive for better labor conditions, the answer isn't to negate work but to improve its quality and ensure its equitable distribution.
Why are technocrats so fixated on the idea of automating humans out of the picture? This idea stems from a belief that humans are the weak link - we just aren't productive enough and can't ever be. In this narrative, a transition to an all-encompassing superintelligence or artificial general intelligence would instigate an unprecedented productivity leap. This, in turn, would massively reduce costs and potentially create an economy where income from personal data could be sufficient to sustain livelihoods.
The conversation should not revolve around compensating people for the “end of work" due to automation but rather be centered on how we can leverage technology to create more meaningful work and a more equitable society.
Programmers are at the forefront of understanding the effects of Generative AI on jobs. As they adapt and interact with this evolving technology, their experiences provide a direct window into how GenAI is poised to transform the employment landscape for everyone. Their role, central to the development and deployment of AI, makes their observations and challenges especially telling for the future of work.
A study conducted on GitHub Copilot's effect on software developers' productivity gives us an early look at possibilities. Copilot helps developers maintain their creative momentum by auto-completing code. According to GitHub, Copilot churns out a whopping 46% of the code and turbocharges developers' coding speed by up to 55%. The ambition here is to revamp developer productivity by subtracting mundane tasks and complex development work from their to-do list. Liberated from the drudgery, developers can invest their energy into creativity and innovation.
At first glance, these figures seem astronomical. Similar productivity leaps are being touted across all domains of knowledge work. The interpretations of these advancements vary wildly, from doomsday prophecies of human obsolescence to optimistic visions of a productivity revolution sparking remarkable progress. But how should we interpret the automation of knowledge work, a domain that intertwines social and analytical reasoning, and balances decisions on a fulcrum of experience, intuition, analysis, and observation?
A blind spot in a narrowly focused productivity conversation is the tricky business of assessing what exactly elevates productivity in knowledge work. This dilemma comes into sharp relief in the realm of AI-enhanced software development. Developers using AI aids can feel more productive than they actually are—their speed at churning out code often tells a different story. Developers also navigate a form of on-the-fly cost-benefit analysis, moderated by their familiarity with the task and the complexity of the AI's suggestions. This interplay means the effectiveness of AI is inextricably linked with the developer's evolving grasp of the situation they're tackling step by step.
Programming turns out to be a fascinating case study when it comes to AI-enhanced productivity. Some outcomes are glaringly obvious—like boilerplate code spewing forth effortlessly from an AI tool. However, other impacts are subtler: programmers could lean too heavily on AI tools, adhering to rigid standards that stifle evolution or even create needless complexity. Worse yet, when glitches arise from AI suggestions, unraveling them can erode the benefits of automated code.
Our examination of programming unveils crucial insights into how humans use AI in knowledge work. The crux of the matter lies in an ever-shifting landscape of decisions, underpinned by two variables: the complexity of the task at hand and the AI suggestion, and the programmer's familiarity with the code in both the context of the task and the programming language used. But here's the kicker: neither complexity nor familiarity come pre-packaged with standard definitions—they're unique artifacts, shaped by each programmer's perception.
Drawing definitive conclusions from these limited, early-stage studies could lead to inflated estimates of Generative AI's impact. Yet, there's an intriguing trend in the data: tools like ChatGPT appear to affect beginners and experts differently. It's premature to draw broad generalizations—the key question is whether Generative AI primarily uplifts the less experienced or skyrockets the experts even higher. The answer likely depends on the nature of expertise, the context of the role, and the broader system encompassing these skills.
Programmers are integral components in the techno-social ecosystem. Programmers are embedded within our modern technological and social structures. Their skills and actions directly influence these systems, making their work highly relevant and representative of our broader societal interactions with GenAI.
The nature of programming work allows for tangible, quantifiable results. We can measure efficiency, error rates, code complexity, and other metrics, offering an insightful lens through which to observe the integration of GenAI.
Programming languages present a unique case study in how GenAI interacts with structured data. The path that GenAI takes through the training data with a programming language differs from its approach to natural language, adding a distinct layer to our understanding of GenAI's functionality.
Programming tasks can be broken down into smaller components more readily than tasks in other fields. This disaggregation allows us to understand the tasks more fully and to observe how they are influenced by GenAI more directly.
In the realm of programming, it is easier to dissect and identify these key aspects of the work process, namely the ability to unpack decisions, predictions, judgments, and actions. This dissection provides a window into the different facets of human creativity and how they interact with GenAI, offering insights that may guide future development and integration of AI systems in the workplace.
Programmers, by the nature of their work, are often at the forefront of technological evolution, learning and adapting to new tools and languages. This capacity to adapt makes them an ideal demographic for studying the acquisition of new skills and techniques in the age of GenAI.
Finally, the interaction between programmers and GenAI provides an opportunity to explore and understand the dynamics of human-AI collaboration, vital for understanding the emerging system.
AI is shaking up the way humans develop expertise and how it's valued. It can make errors in unpredictable ways, struggle with adapting to new environments, and create more decisions for humans to handle. AI separates the prediction of an outcome from judging its meaning, adds complexity to decision-making, alters expectations of humans when machines fail, and changes the nature of learning.
Becoming an expert often involves a period of apprenticeship, but AI has the potential to disrupt this process fundamentally. What happens when AI demands high levels of human expertise, yet its usage inadvertently hinders the development of that expertise?
The aviation industry offers several case studies. In 2008, during a flight from Singapore to Perth, the Qantas Airbus A330's flight computer malfunctioned, causing the plane to violently nose-dive. The system mistakenly believed the aircraft was stalling.
Captain Kevin Sullivan, a former US Navy fighter pilot, instinctively grabbed the control stick when the plane's nose pitched down. He pulled back, but nothing happened. Then, he did something counterintuitive – he released the stick. Drawing upon his years of experience as a fighter pilot, Sullivan trusted his intuition and flew the plane manually. Throughout the ordeal, the pilots received no explanation from the computer and had to rely on their own judgment.
Almost an hour after the first dive, Sullivan managed to land the plane safely at a remote airfield in northwestern Australia. While nearly 100 people were injured, some critically, there were no fatalities. Sullivan's experience and expertise enabled him to recognize the computer's erroneous interpretation of the situation. His years of traditional, high-pressure flying had equipped him with the skills necessary to intervene and take control when automation failed.
Picture the stark contrast between the fighter jets Sullivan trained in and today's modern aircraft: Hand flying, a tactile experience where pilots' control sticks connect to the plane's parts through wires and pulleys, versus the sophisticated fly-by-wire systems of the 21st century. With modern systems, a side-stick's electronic connection tells the computer what to do, replacing the old mechanical link.
Fly-by-wire systems and cutting-edge electronics offer increased safety and ease of use. But, in the heat of an emergency, they can leave pilots feeling disoriented. Nowadays, regulations forbid pilots from hand flying at cruising altitude, meaning many have never experienced the sensation of an aircraft's natural response at that height.
QF72's harrowing story raises a crucial question about AI's role in the physical world: How can we strike a balance between AI systems designed to replace humans most of the time and the undeniable need for human expertise to ensure consistent reliability? AI systems can falter at critical moments, calling for human intervention. The real challenge lies in fostering and maintaining high levels of expertise in a world increasingly dependent on AI systems that replace human action in most situations.
Imagine this conundrum: we're trying to develop expertise, but the very nature of automation aims to eliminate the need for it. It's a mind-bending paradox that leaves us questioning the future of human proficiency in an increasingly automated world. It's an energetic debate that demands our attention and forces us to contemplate the consequences.
Take, for instance, the daunting task of integrating self-driving vehicles into our everyday lives. Sure, it's one thing to train a small, elite group like pilots, but it's a whole different ball game when we're talking about someone with only a few hours of driving experience taking the wheel of a self-driving car. The only viable long-term solution for widespread adoption of self-driving cars is to create a system where human intervention is never expected or required.
Now, let's consider the broader implications: What happens when AI empowers non-experts to perform tasks that were once exclusive to experts? The rapid advancements in large language models and generative AI are making this a reality, shaking up the very foundations of careers, jobs, and our overall well-being.
Will the push for automation lead to a world where human expertise is undervalued, or can we strike a delicate balance that allows both human mastery and AI to coexist and complement each other?
Recently, MIT researchers had humans working with a cutting-edge AI system to build a website, converting GPT-3 into HTML code. The results? Human programmers using GPT-3 are a whopping 30% faster. Even non-programmers, who would otherwise struggle with HTML code, can create websites from scratch using AI, and just as fast as expert programmers. Talk about a game-changing transfer of value from expert to non-expert. So, in a world where non-experts can perform tasks just as well as the pros, why bother paying for expertise?
But things aren't always as straightforward as they appear. Sure, the non-programmer might hit a roadblock and need expert help—at least for now. But expert programmers can harness the power of LLM’s to speed up their work, tackle mundane or repetitive tasks, and push the boundaries of what's possible. By doing so, they could have more time to focus on the bigger picture, like addressing data bias issues or bringing different stakeholders to the coding table. As AI changes the landscape, what we value in a programmer may evolve.
Paradoxically, as AI threatens the development of expertise, it only amplifies its ultimate value.
In 2016, Geoffrey Hinton, a pioneer in deep learning, boldly claimed, "We should stop training radiologists now, it's just completely obvious within five years deep learning is going to do better than radiologists." Similarly, in 2017, Vinod Khosla, a prominent venture capitalist, asserted that "the role of the radiologist will be obsolete in five years." Radiology, it seemed, was destined for obsolescence, with artificial intelligence (AI) taking over the reins by 2020. Machines, according to Oxford economists, would replace doctors as many tasks within professional work were deemed routine and process-based, lacking the necessity for judgment, creativity, or empathy.
Yet, these predictions from technology visionaries failed to materialize. What led to their glaring misjudgment? And, more broadly, what can we learn about AI-driven human obsolescence?
Sensitivity Versus Specificity
A seminal 2013 radiology study played a significant role in shaping the discourse on automation in medicine. In this study, twenty-four radiologists participated in a familiar lung nodule detection task. Researchers surreptitiously inserted the image of a gorilla, forty-eight times larger than the average nodule, into one of the cases. The findings were astonishing: 83 percent of radiologists failed to see the gorilla, even though eye-tracking data showed that most were looking directly at it.
Inattentional blindness, a phenomenon that can affect even the most skilled experts in their respective domains, reminds us that humans, no matter their level of expertise, are fallible. When engrossed in a demanding task, our attention behaves like a set of blinkers, preventing us from seeing the obvious.
In mathematical terms, this can be described as a failure in sensitivity. The radiologists in the aforementioned study were unable to detect a conspicuous anomaly in the image—a false positive, in this case. AI can compensate for this bias by collaborating with radiologists to screen for any potential abnormalities, such as an unexpected gorilla.
However, sensitivity alone does not encompass the entire diagnostic process. Radiologists must also accurately identify negative results, a measure known as specificity, to avoid raising false alarms. Humans excel at determining whether a suspicious finding flagged by AI is truly a cause for concern.
Generally speaking, machines demonstrate superior sensitivity (identifying deviations from the norm), while humans exhibit greater specificity (assessing the significance of these deviations). Sensitivity and specificity are interdependent variables; adjusting one invariably affects the other. Designing a machine with both high sensitivity and specificity is an impossible task, as a trade-off between the two is unavoidable. This is why the partnership between AI and radiologists proves superior to either working independently—the collaboration strikes an optimal balance between expert machine and expert human.
This insight reveals a key reason why Hinton and Khosla's predictions missed the mark—ironically, they overlooked the statistical nature of diagnosis and the necessity for humans to address machine errors.
Diagnosis is not a simple binary process of yes or no. Imperfections in tests and the ever-present possibility of errors necessitate accounting for false results. Designing AI with both low false positive and low false negative rates proves to be a challenging endeavor. Instead, a more effective approach involves creating machines that compensate for errors humans are prone to, while capitalizing on the innate strengths of human expertise.
Nature of Decisions
Yet, there's another dimension to this narrative. As researchers and practitioners observe the collaboration between human and machine, they're witnessing a shift in the perception of diagnostic accuracy. Prior to AI's integration into the workforce, image-based diagnosis was primarily concerned with detection, posing the question: "did we find something that looks wrong?"
With AI now identifying a greater number of lesions or areas warranting further examination, radiologists are devoting more time to determining the significance of these findings. The central question has evolved into: "is this anomaly associated with a negative outcome?"
This example offers a second clue as to why technology experts misjudged the situation—AI has effectively bifurcated the diagnostic decision-making process. Previously, human radiologists made decisions that combined prediction and judgment. However, when humans make decisions, the prediction (such as an abnormal lesion) is often indistinguishable from the judgment regarding the danger it poses (whether the lesion is problematic). AI disentangles prediction and judgment in decision-making, leaving the human to exercise judgment. This separation can be subtle, with humans sometimes unaware that they're making a prediction as part of a decision.
AI has redefined the diagnostic landscape. Radiologists now face an increased volume of disease assessments generated by AI, and must evaluate whether a positive result carries implications for clinical outcomes. This intricate judgment, taking into consideration potential interventions and associated risks, calls for the very empathy and creativity that technology forecasters prematurely dismissed as obsolete.
System Effects
Moreover, for AI to become a genuinely valuable tool in radiology, radiologists themselves must take on the responsibility of training, testing, and monitoring outcomes. A data scientist's expertise can only go so far; the radiologist's judgment in assessing the connection between diagnosis and clinical outcome is vital. As researchers noted in The Lancet, "Unless AI algorithms are trained to distinguish between benign abnormalities and clinically meaningful lesions, better imaging sensitivity might come at the cost of increased false positives, as well as perplexing scenarios whereby AI findings are not associated with outcomes."
Hinton and Khosla also failed to account for the practical challenges AI encounters in the real world. Theoretical success doesn't always translate into practical effectiveness, as reality often proves more complex than our assumptions. By 2020, a mere 11 percent of radiologists reported using AI for image interpretation. This low adoption rate is primarily due to AI's inconsistent performance—94 percent of users experienced variable results, while only 5.7 percent reported that AI consistently worked as intended. This level of unreliability is insufficient to gain the trust of doctors.
The chasm between AI's potential and its real-world application is intricate. AI model development begins with testing in a highly controlled and limited environment. Machine learning engineers collaborate with a select group of experts to train a model, evaluate its performance, and then deploy it within a specific setting—such as a radiology department in a single hospital.
AI luminary Andrew Ng, known for his work at Google Brain and Baidu, has shed light on the challenges of transferring AI models between environments. When AI is trained and tested in one hospital—typically an advanced or high-tech facility—researchers can demonstrate its performance on par with human radiologists. However, when the same model is applied to an older hospital with dated equipment and differing imaging protocols, the data becomes inconsistent, leading to a decline in performance.
This issue of model transferability is another crucial factor in Hinton and Khosla's misjudgment. An AI trained in one location may not be dependable in another. In stark contrast, a human radiologist can effortlessly transition from one hospital to another and still "do just fine."
Enter Foundation Models
The emergence of ubiquitous foundation models could reshape the landscape of AI applications in radiology and other domains. These models, pre-trained on vast amounts of data from diverse sources, could be further fine-tuned to specific environments and tasks, potentially improving transferability and performance. By incorporating diverse imaging protocols and equipment types, these models might better adapt to different hospitals, addressing the current limitations in model transferability.
However, the rise of foundation models also presents its own set of challenges and risks. As models become more intricate and expansive, interpretability becomes increasingly difficult, raising concerns about transparency and accountability in decision-making. If history is anything to go by, this will give rise to new skills and tasks for future radiologists.
In a twist of fate, rather than becoming obsolete, the number of radiologists in the US has increased by around 7 percent between 2015 and 2019. There is now a global shortage of radiologists, in part due to an aging population's rising demand for imaging. Ironically, the bottleneck in radiology now lies in training.
The prevailing sentiment is that "AI won't replace radiologists, but radiologists who use AI will replace those who don't." Far from becoming obsolete, radiologists are in high demand, partially thanks to the benefits AI brings to the field. As image interpretation becomes more cost-effective, the demand for diagnostic imaging rises, as do the other functions of the role. As diagnostic complexity grows, the value of human judgment increases in tandem. Deciding the appropriate course of action requires a holistic, synthesized, team-based, and personalized set of decisions, not just a single readout from an image.
Perhaps more importantly, people are inherently resistant to obsolescence. We've witnessed this throughout history, from the Luddite uprisings to the more recent populist backlash against globalization. Powerful AI will undoubtedly create pockets of obsolescence across various professions. However, disruption isn't guaranteed; society has choices regarding transitions. But to make wise decisions, we'll need an abundance of collective intelligence.
Originally published in Quartz, January 2018
As world leaders gather for Davos, one of the common and continuing themes is the emerging threat of automation and the consequent effect on economic inequality and global stability. Responding to the so-called fourth industrial revolution has become one of the biggest topics of discussion in the world of technology and politics, and it’s not surprising that anxiety runs high.
A lot of the current conversation has been shaped by research with scary conclusions such as “47% of total US employment is at risk from automation.” In a survey last year of 1,600 Quartz readers, 90% of responders thought that up to half of jobs would be lost to automation within five years—and we found that everyone thought it was going to happen to someone else. In our survey, 91% of those who work don’t think there’s any risk to their job, for example.
If it’s true that half the jobs will disappear, then it’s going to be an entirely different world.
As leaders and policy makers consider the broader implications of automation, we believe it’s important that they remember that the predictions and conclusions in the analytically derived studies—such as the 47% number—come from just a few sources. All the studies on the impact of AI have strengths and weaknesses in their approach. To draw deeper insight requires taking a closer look at the methodology and data sources they use.
🤖🤖🤖
The studies
We have attempted to summarize the outputs and approach of three studies—from Oxford University (pdf), McKinsey Global Institute, and Intelligentsia.ai (our own research firm acquired by Quartz in 2017). We chose the Oxford study because it was the first of its kind and highly influential as a result. We chose MGI because of its scale. And we chose our own because we understand it in great detail.
🤖🤖🤖
Our conclusions
We conducted our own research because we wanted to understand the key drivers of human skills and capability replacement. We were both surprised and pleased to find that, even though machines indeed meet or exceed human capabilities in many areas, there is one common factor in the research that artificial intelligence is no match for humans: unpredictability. Where a job requires people to deal with lots of unpredictable things and messiness—unpredictable people, unknown environments, highly complex and evolving situations, ambiguous data—people will stay ahead of robots. Whether it’s creative problem solving or the ability to read people, if the environment is fundamentally unpredictable, humans have the edge. And likely will for some time.
In fact, we found four themes where jobs for humans will thrive:
When work is unpredictable, humans are superior.
Our conclusions about their conclusions
In all of the studies, researchers had to grapple with the sheer level of uncertainty in the timing and degree of technological change. This is a conclusion in itself and a serious challenge for policy makers whose goal it is to plan for social support and education across generations.
Common across the studies was a recognition of a new kind of automation; one where machines learn at a scale and speed that has fundamentally changed the opportunity for AI systems to demonstrate creative, emotional and social skills, those skills previously thought as solely human. Machine-learning systems operate, not as task-specification systems, but as goal-specification systems. This is important because it means that, increasingly, many automated systems adapt and reconfigure themselves on their own.
The biggest weakness of all the studies is that jobs aren’t islands; boundaries change. The story of automation is far more complex and beyond the reach of the models and the data we have at hand. Jobs rarely disappear. Instead, they delaminate into tasks as new technology and business models emerge.
None of these studies is able to forecast the impact of reimagining scenarios of business process changes that fundamentally alter how an occupation functions. None of them can take into account the “last mile” of a job, where the automation can be relied upon for 99% of the job but it still takes an on-the-job human to do the 1%. None of them conveniently spit out what knowledge will be most valuable.
There are counterintuitive effects to automation such as how the value of a job changes after the automation of one component. If a specific task in a job is automated, creating value through an increase in productivity, it tends to raise the value of the whole chain of tasks that make up that job. So investment in capabilities that can’t be automated will be a good investment.
Finally, there are new jobs. We are far from solving all the world’s problems and we have an insatiable appetite for more. Just because people today can’t think of the new jobs of tomorrow doesn’t mean someone else won’t.
🤖🤖🤖
A note on the data
The common data set used by many of the big studies is O*Net (Occupational Information Network). This is the best data, anywhere. It was built for the US Department of Labor, primarily to help people match things they care about (such as skills, knowledge, work style and work preferences) to occupations. For every occupation, there is a different mix of knowledge, skills, and abilities for multiple activities and tasks. When all of these are described and assigned standardized measures such as importance, frequency, and hierarchical level, the final O*Net model expands to more than 270 descriptors across more than 1,000 jobs.
Why does all this matter? Because this level of complexity is what it takes to make it fit for purpose. The data isn’t gathered for the purposes of analyzing automation potential so any and all automation modeling has to transform this complex and handcrafted dataset. Subjective judgements of researchers or statistical manipulation of standard measures are the most important new inputs to a novel use of this data store. There’s a lot of room for fudging, personal bias and lies, damned lies. Absurd results can happen. Previously, when the data was used to predict offshorability, lawyers and judges were offshored while data entry keyers, telephone operators and billing clerks could never be.
Still, it’s the best data available and if it’s good enough for designing jobs, it’s probably good enough for deconstructing them. It’s all a question of how and where data is manipulated to fit the modeling goal.
🤖🤖🤖
Evaluating those studies in detail
Oxford University
The detail: This research, first published in 2013, kicked off the automation story with the finding of 47% of total US employment being at risk from automation. This was an academic study to figure out the number of jobs at risk. It turned out that it wasn’t realistic to pinpoint the number of jobs that would actually be automated so instead they developed a model that calculated the probability of computerization of any given job.
The most important human touch was a binary yes/no assessment of the ability to automate a job. In a workshop at the University of Oxford, a handful of experts, probably clustered around a whiteboard, went through a sample list of 70 jobs, answering “yes” or “no” to the question: “Can the tasks of this job be sufficiently specified, conditional on the availability of big data, to be performed by state of the art computer-controlled equipment?” We don’t know what jobs they chose, it’s safe to assume the people in the room were not experts in those 70 jobs nor do we know whether there was enough tea and biscuits on hand for them to be able to think as deeply about job number 70 as job number 1.
The researchers were super aware of the subjective nature of this step. The next step was designed to be more objective and involved ranking levels of human capabilities to find the nine most human capabilities that matched to three engineering bottlenecks they were interested in: perception and manipulation, creativity, and social awareness. From this ranking, they were then able to apply statistical methods to come up with probabilities of these capabilities being computerized and therefore the probability of any whole job being automated.
The limitations of their approach are twofold. First, they looked at whole jobs. In the real world, whole jobs are not automated, parts of jobs are. It’s not possible to fully critique the effect of this on the final results—it’s all hidden in the stats—but it’s intuitive that the “whole job” aspect of this is highly overstated. Second, using “level” as the objective ranking mechanism introduces an important bias. Machines and humans are good at different things. More importantly, what’s easy and “low level” for a human is often an insanely difficult challenge for a machine. Choosing “level” as the primary objective measure risks overstating the risk to low-wage, low-skill perception and manipulation-heavy jobs that operate in the uncontrolled complexity of the real world. Given the researchers would have been very aware of this effect—which is known as Moravec’s Paradox (paywall)—it’s surprising that they didn’t specifically account for it in the methodology. It is potentially a significant distortion.
One more thing. The researchers did not take into account any dimensions of importance, frequency, cost, or benefit. So all capabilities, whether important or not, used every hour or once a year, highly paid or low wage, were treated the same and no estimates of technology adoption timelines were made.
So while this is a rigorous treatment of 702 jobs representing almost all the US labor market, it has the limitation that it relied on a group of computer-science researchers assessing jobs they’d never done at a moment in time when machine learning, robotics, and autonomous vehicles were top of mind and likely firmly inside their areas of expertise (as opposed to, say, voice, virtual assistants and other emotional/social AI) and without any way of modeling adoption over time. A figure of 47% “potentially automatable over some unspecified number of years, perhaps a decade or two” leaves us hanging for more insight on the bottlenecks they saw and when they saw them being overcome.
Perhaps their most important contribution is their crisp articulation of the importance of goal specification. Prior waves of automation relied on human programmers meticulously coding tasks. Now, with machine learning, particularly with significant progress being made in reinforcement learning, the important insight is that it’s far more important to be able to specify the goal for an AI than to input defined tasks for the AI to perform. In many circumstances, there are now the tools for machines to figure out how to get there on their own. Creative indeed.
McKinsey Global Institute
The detail: The heavyweights of business analysis, MGI published their report in early 2017 analyzing the automation potential of the global economy, including productivity gains. It’s comprehensive and the analytical process is extensive. They recognized the weakness of analyzing whole jobs and, instead, used O*Net activities as a proxy for partial jobs. They also introduced adoption curves for technology so they could not only report on what’s possible but also on what’s practical. As such, their conclusions were more nuanced with around 50% of all activities (not jobs), representing $2.7 trillion in wages in the US, being automatable. They found that less than 5% of whole jobs were capable of being fully automated. Adoption timing shows a huge variance with the 50% level reached in around 2055—plus or minus 20 years.
MGI took around 800 jobs (from O*Net) and their related 2,000 activities, which they then broke into 18 capabilities. These 18 capabilities, with four levels each, were uniquely designed by the MGI team. This capability/level framework is at the core of the technological potential analysis. These 18 capabilities are perhaps the most important human-touch point of the MGI analysis. “Academic research, internal expertise and industry experts” informed this framework. Their framework offers a far more appropriate description of human skill-level in relation to automation than does the O*Net data. This framework was then used by experts to train a machine learning algorithm and apply the capabilities across 2,000 O*Net activities to create a score for each activity. There’s some “secret sauce” at work here. It’s impossible for any outsider to tell how capability levels or automation potential are assigned against activities. It’s a mix of human, machine, and consulting nuance.
Finally, to analyze technical potential, they developed “progression scenarios” for each capability. This step must have taken quite some effort. Surveys, extrapolation of metrics, interviews with experts, recent commercial applications, press reports, patents, technical publications, Moore’s Law all went into the mix. Perhaps there was some key factor in the model that got tweaked at the last minute by an individual analyst. We’ll never know. Nevertheless, they are experts with access to vast resources of academic expertise and they have a ton of practical operating chops.
In the second major stage of their analysis, they created adoption timelines. Here, they use data from 100 automation solutions that have already been developed and create solution times for the 18 capabilities. To assess the impact of automation across industries and jobs, they use proxies from example jobs (there’s a lot of expert consulting input to this) to convert the frequency of an activity into time spent in a job, leading finally to economic impact by industry, activity, and job. This is the sort of modelling that only a handful of groups can pull off. With so many inputs and the creation of databases as well as models, it would be daunting to recreate.
Weaknesses? The MGI analysis has two important limitations. First, by using only the activities from O*Net and defining their own capabilities, they miss the rich detail of how important a given capability is for a job, instead, they are all treated as equally important. This may have the effect of underestimating the incentive to automate particular activities in higher wage jobs where the importance of a capability is high, say, an interface between professionals and clients. Second, how they determined adoption timelines is completely opaque to an outsider. But, because of the huge uncertainty of the 40 year span, it doesn’t really matter. What’s important is that one of the world’s premier analytical agencies has been able to demonstrate just how uncertain this all is. The takeaway is that there’s no way to get a sense overall of when the breakthroughs may happen and how they may affect jobs. The most difficult job now? Being a policymaker developing long-range plans in such an uncertain techno-socio-political environment.
A key piece of information that is easily overlooked in the MGI report is how much more there is to harvest from current systems and how big the opportunity is to make business technology interfaces more human and more seamless. Maybe it just doesn’t sound sexy when “collecting and processing data” is put up against other, more exciting ideas but these activities consume vast amounts of time and it’s a safe bet that it’s one of the most boring parts of many people’s jobs. Even with the Internet of Things and big data technologies, there’s still an enormous of amount of data work that’s done by human hand, consuming hours of time per day, before people get on with the real work of making decisions and taking action. With advances in conversational AI and vision interfaces, we would expect to see an explosion in developments specifically to help people better wrangle data.
Intelligentsia.ai
The detail: At Intelligentsia.ai, we were fascinated by the debate over automation of jobs and decided to do our own analysis of the opportunity to invest in automation, that is, invest in some new kind of machine employee. We, too, turned to O*Net. “Level” was required but not enough to really understand the incentive to invest; we needed both level and importance.
Our methodology did not employ any statistical methods or machine learning. We had to scale everything by hand. Our subjective assessments were primarily predictions of what a machine can do today versus what we think a machine will be able to do in 20 years. This relied on both research expertise and considered opinion plus the concentration it took to assess and rank 128 capabilities.
Our view is that there is more intra-technology uncertainty than there is inter-technology uncertainty. That is, there’s more chance of being completely wrong by forecasting a single technology than across a set of technologies so we felt comfortable that technology forecasting uncertainty would broadly average out across the analysis. However, it’s the biggest weakness in our analysis, primarily because it would be highly unlikely that we or anybody else could reproduce our technology capability curves.
We used these forecasts to determine when a machine could match each capability within a job. This allowed us to create an attractiveness ranking using both importance and skill for each job to which we could apply a dollar figure. From there it was an excel number crunch to create a list of the most attractive AI capabilities to invest in and the most likely jobs to be impacted. We found that a market opportunity for machine employees of $1.3 trillion in the US. Because we weren’t trying to determine the jobs at risk, we didn’t get to a set percentage. However, we did find the percentage of capabilities where a machine could perform the role as well as a human to max out at around 46% in 10 years and 62% in 20 years. Most jobs were significantly less than this.
From our admittedly biased perspective, the most useful part of our analysis was that it helped to hone in on the best opportunities for investing in AI in the next 10 years. If you’re an entrepreneur and want to create products with the greatest market opportunity, invest in AI for combining information and choosing methods to solve problems, as well as emotionally intelligent AI that can assist people in caring for others or motivating people. If you’re an “intrapreneur” and looking for the highest value-add inside a company, invest in AI that listens and speaks in a socially perceptive way as well as the next generation of insight discovery AI for analysis and decision support.
Economists worry a lot about productivity because it is one of the key drivers of economic growth.
The relationship between technology and productivity over the past few decades, particularly in developed economies, is often referred to as the "productivity paradox". The term was first coined by Erik Brynjolfsson in 1993 to explain the disconnect between the noticeable advances in the IT revolution and the lack of productivity growth, particularly in the US.
To give an overview, digital technologies have grown exponentially over the past 30 years. We've seen a rise in personal computing, the advent of the internet, the ubiquity of mobile devices, and more recently, advancements in AI and machine learning. Despite these significant technological advancements, productivity growth in many developed economies has been slow, a phenomenon often called "secular stagnation".
Several hypotheses have been proposed to explain this productivity paradox:
While many economists and policymakers expect that advancements in areas like AI and machine learning will eventually lead to a significant increase in productivity, the question remains as to when we will start to see these impacts reflected in productivity statistics.
A key point to make in discussions about productivity is that it’s not just productivity we care about: we actually care about doing more with less. This is where a different measure is important: total factor productivity. Productivity in its simplest form is about how much output you get from a certain amount of input. For example, if you work at a pizza shop and you make 10 pizzas in an hour, that's your productivity. If you figure out a way to make 15 pizzas in an hour instead, your productivity has increased.
But that's a pretty simple example, right? In the real world, making a product or offering a service usually involves many different types of inputs. In the case of our pizza shop, you don't just need one pizza maker. You need ingredients like dough, sauce, and cheese, and you also need equipment like ovens, pizza cutters, and maybe even a delivery car.
Total Factor Productivity (TFP) is a way to measure productivity that takes into account all these different inputs, not just one of them. So, when we talk about TFP, we're not just talking about how many pizzas one pizza maker can make in an hour. We're talking about how efficiently the entire pizza shop is using all of its resources (workers, ingredients, equipment) to produce pizzas.
Why is TFP important? Well, imagine if the pizza shop just hired more pizza makers to increase the number of pizzas produced per hour. That might look like a productivity increase at first, but it doesn't necessarily mean the shop is using its resources more efficiently. It's just using more resources.
But if the pizza shop can produce more pizzas without using any more workers, ingredients, or equipment – for example, by rearranging the kitchen to reduce the time pizza makers spend walking back and forth, or by tweaking the recipe to get more pizzas from the same amount of dough – that's an increase in TFP. It means the shop is getting better at turning its inputs into outputs. And in the long run, being able to make more with less is a key driver of economic growth and prosperity.
So while basic productivity measures can be helpful, TFP gives a more complete picture of how efficiently an economy is using all its resources, not just labor.
There’s another important metric: Marginal Productivity and this is what companies care about.
Remember that TFP is about how efficiently all inputs (or factors) of production—workers, ingredients, and equipment in our pizza shop—are used to produce output (pizzas). If we can make more pizzas without using any more of our inputs, that's an increase in TFP. It's a measure of our productivity considering all factors.
On the other hand, Marginal Productivity is about the additional output (extra pizzas) you get from adding just one more unit of a specific input (like one more worker), while keeping all other inputs constant.
For example, let's say our pizza shop currently has 3 workers and makes 90 pizzas an hour. That's 30 pizzas per worker per hour. If we hire one more worker and now we can make 125 pizzas an hour, the marginal productivity of the worker is 35 pizzas per hour (125 pizzas minus 90 pizzas). That's because we're looking at the extra output we got from adding that one additional worker, assuming nothing else changed.
The difference between TFP and Marginal Productivity is in what they're measuring. TFP is about overall productivity considering all inputs, while Marginal Productivity is about the additional output you get from increasing a single type of input. Economists use both to understand productivity and efficiency in different ways and contexts.
Understanding marginal productivity is especially important when a business is deciding whether to hire more workers, buy more machines, or invest in more materials. If the additional output (marginal productivity) from hiring an additional worker is more than the cost of hiring that worker, then it makes sense to hire. Otherwise, it doesn't.
Our article in Quartz about how to make sense of job automation studies
On developer productivity and use of AI tools:
On ChatGPT and productivity:
On Automate V augment and London Cabs
On AI as a Prediction Machine and decisions as predictions and judgments
On the historical application of technology, productivity and automation
On How AI flattens organizations
On Occupational Impact of GenAI
Writing and Conversations About AI (Not Written by AI)