AI Agents, Mathematics, and Making Sense of Chaos
From Artificiality This Week * Our Gathering: Our Artificiality Summit 2025 will be held on October 23-25 in Bend, Oregon. The
On April 9th, we presented our research update on the State of AI & Complex Change which focused on confusion about AI, status of AI adoption and deployments, anxiety about AI, and AI in the Wild.
On April 9th, we presented our research update on the State of AI & Complex Change which focused on:
Below is:
Reach out with any questions. And stay tuned for next month's research briefing on Agentic AI. Subscribe now to join us!
Dave Edwards 0:00
Welcome to the April edition of our our monthly updates. This one, we're doing a bit of sort of a quarterly review to look at data across the last few months. related to a few different topics.
But before we start, we want to just say we hope you all enjoyed the Eclipse yesterday, there's a photo of the two of us sitting outside our home in Bend, Oregon. And the other two, we're going to have a bit of a of a brag moment, those are from our daughter, who was witnessing the Eclipse from NASA Mission Control in Houston. And she sent those photos to us along with the with the comment, Houston, we have an eclipse, so you get a sense of at least persons of humor anyway.
This month, we're talking about four different categories. Topics, one is confusion. There's a lot of data that's coming out. It doesn't seem to make sense. It's conflicting. And so it's creating some level of confusion of what's really happening. And so we're going to talk about some of those. But they're going to dig into some status about what's going on in the world of generative AI who's using it, what are they using it for? What are companies expecting, we're then going to talk about anxiety, because there's definitely rising anxiety about AI, its effect on jobs, its effect on on life, on the planet, etc. And so we'll tap into some of those. And then finally, we have a few examples of AI in the wild. This month, there are examples of things not to do with AI. So some examples that should serve as warning signs for things that can go wrong with AI.
So let's start with the confusion point.
You want to find out how many companies are using Gen AI, you can get wildly different results. Here's two different answers. 55% of companies are using general AI. According to McKinsey, and only 5% are are using general AI according to the US Census, the US Census has started doing a new survey to assess the impact of AI on the economy, something that I think they're doing twice a year, or they're expecting you twice a year. And we are very interested in looking at how they're assessing it. Because they take such a broad view of the overall US economy. There's a wide delta between 55 and five. And every time you see something different like this, you wonder Gee is did they asked the question differently, they find a define AI differently, etcetera, are a set our guest is that the real delta here is that the US Census is looking across every company and getting responses back across the economy. Where's McKinsey is reaching out to companies that they work with. And they're probably getting responses back from companies they work with. So they're looking at very the largest companies and that slice. It doesn't make either of these answers invalid. But it does make make it important to understand when people say everybody's using it, but then you walk around and think that that not everybody's really using it both of those answers can be true, just depending on which part of the population are actually selecting.
Helen Edwards 3:02
There's another piece of precision in the US census questions which are production of goods and services, right?
Dave Edwards 3:08
They are talking about using general AI in the production of goods and services. And that could be creating a different kind of answer than than McKinsey true.
Okay, so let's look at what do leaders expect and what are they doing about it? Well, about half of leaders expect AI to deliver cost savings in 2024. And half of leaders anticipate cost savings in excess of 20 of 10%. But then let's look at what is actually happening. 6% of companies have trained more than 25% of people in generative AI, right. So basically, very few have have actually done any major training to effect this kind of cost savings. 45% of leaders don't have guidance or restrictions on AI use. 45% of leaders say they have limited or no confidence in their executive teams proficiency in generative AI. And perhaps the most important numbers 90% are either waiting for general AI to move beyond the hype or experimenting in small ways. This is all from the same survey from BCG. So the question here is how to half of leaders expect cost savings from January from from Ai this year? If they're waiting for AI to move beyond the hype, or experimenting in small ways, those two things don't seem like they're things that can exist at the same time. And that could certainly create some level of confusion. We're certainly confused.
So next, dig down a little deeper, a great surveys sponsored by AWS, conducted by Tom Davenport and others of chief data officers. And what this showed is that 98% of CDOs agree that data strategy is crucial for getting value out of generative ai 57% Haven't made any made the nest salary changes, right? So more than half haven't made the changes to something that they say is crucial. Okay, so it's really crucial, not really doing not succeeded at it yet that helps.
That sort of gets at some of this general value question, you know, where wherever you turn, you're seeing that Gen AI provides huge productivity gains, which it does, we see this as good research around tasks proficiency, and efficiency, whether it's writing or code or a variety of other things. Generally, I can certainly provide productivity gains. But when looking across an organization, a lot of where enterprises are not seeing enough value to do things, like subscribe to Microsoft co pilot, whether that's office co pilot, or GitHub co pilot, because they're not seeing that value, an individual tasks aggregate up to an individual overall worker efficiency, or an organization productivity. And this is going to be really important this feeds into our research obsession with the world of workflows. And this contrast is really important.
One of the questions that comes up a lot is how to how are people measuring ROI? So if you're not seeing the ROI and copilot, well, then how are you actually trying to measure it? What is that measurement? Often we've written about this in the last couple of few weeks? This comes from a survey from a16z. So the venture firm Andreessen Horowitz, so put the caveat there that this is organizations that a16z wants to talk to, and organizations that want to talk to a tech VC. But it's interesting that more than half when they when they're asked about how they're measuring ROI, on LLM, spend, half of them believe it's positive, but they're not precisely measuring it. Right. So they're the companies that are pursuing spending money on LLMs. More than half of them, according to this survey, at least, are saying, Yeah, we just believe it's going to work. And it's going to be worth it. We're not measuring. So when you put that combined together with the with the with the sort of question of gee, we're not seeing the results. So there's definitely organizations falling in the bucket that say, we're not seeing it. So we're not spending, but the ones we're spending more than half of them just aren't even measuring. Neither one of those bad thing by the way, this is just sort of getting at some of the common trust.
Okay, so now let's look at some status, some other data that we have, about who's using it and why.
And you're looking at how many people are using chat GPT. This is data from Pew Research that spans six months between July 23, and February 24, a significant growth of people using chat GPT, across all age brackets, that yellow line in the middle is all adults that increased from 18 to 23%, then you can see all the rest of the lines, the youngest is at the top down to the oldest. So seeing a disproportionate usage of chat GPT and younger workers aged 18 to 29, versus those who are older, but every category is increasing. And by a pretty significant number across all age brackets.
What are they using for that's also interesting, it's we've seen steady increase. This is March 23, to July 23, to February 24. So it has an earlier period of time in this data than the previous one. This is also from Pew so same survey, good increase of learning to using it for learning something new in the middle, and for entertainment on the right. But in some of the biggest increases for tasks at work. All three of these categories started kind of in the same range eight to 11%. But the four tasks at work is the one that seemed the biggest increase from 8% to 20% of people are using chat GPT for tests at work.
Look at from an organizational perspective. This is more from the US Census, if you look at uses of data, and they broke things down in terms of looking past so the past six months versus expectation of usage in the next six months. And this is data from those who either have used or expect to use so it excludes all the people who are just not using AI and not expecting to use AI. But this is it's interesting that every category is showing a steady increase. Look at the top level marketing automation using AI 28% said they were using in the last six months but now 37%. So they're expecting in the next six months as a really significant gain from one to the other data analytics growing from 16% up to 30%. These are pretty significant shifts that are coming through in the survey. Interestingly though, there's not always consistent across size a company and the sector and that's where these next two slides come from.
Size of companies interesting. So we looked at companies that are talking about trends meaning people to use AI and creating new workflows for people to use AI. And what we've done here is isolated the data based on whether it's counting the number of firms, or it's creating an employee waited an answer. So look at the number of companies versus the size of the company essentially, is kind of where these numbers come out. The blue and green is showing total number of companies. So roughly 20%, to roughly 40% of companies are expecting to go from training and new workflows from the last six months to they're expecting to increase the number of amount of training and creating new workflows. But then if you look at the employee weighted numbers, it's a significant decline from 40 50%, down to sort of roughly 25%. What this says to us is in the data is we're seeing a lot work, there's a lot more expectation of growth in creating training programs, and creating new workflows for smaller companies, which is why that number of companies sort of data as is exploding.
Dave Edwards 11:09
This is we also pulled out two sectors that total anomalies versus every other sector, when you look at current use of AI versus expected use of AI, retail and food service had numbers that are really quite different from everywhere else. And what it's showing is the inverse of we showed you in that last data basically, is the number of companies expecting is growing, but not a significant growth. But the employee weighted number is obviously skyrocketing from three 4% up to 20%. That's a really big difference. So that says to us, at least in the survey respondents, the large companies in retail and food service, meaning the companies that employ lots of people, I have plans in the next six months to be using AI. And so that could be indicating we could see more announcements like Wendy's putting generative AI in the drive through looking at those kinds of scales of of increased usage.
All that's in reality today looks like companies are largely experimenting. This is data from Wiz, which cloud security company that looks across a very large number of public cloud instances to evaluate what's going on in those. And while 70% of cloud environments have cloud based managed AI services, they are inferences that only 28% appear to be doing more than experimenting. So it's a small number of companies that are doing anything majority of companies are basically just experimenting. That kind of makes sense. There's a lot of discussion about the AI. There's a lot a lot of you know success stories that big tech putting out all the big consulting firms are putting out. But then when you see how many people are actually using anything truly actively, and how whether AI is pervasively moving across the company, there's a gap there. And that's that this data kind of reinforces that perception.
Part of that's due to challenges of AI adoption. This is data from AWS, a survey of big of Chief Data Officers, oops, highlight their major, the biggest challenge is data quality, finding the right use cases, creating guardrails around the effective and responsible use of generative AI, and security and privacy and data. Those are the top four, there was a longer list that you can find if you want to find the data hit us up if you'd like a link, but that's where the key categories are for this.
And the current use of generative AI sort of reinforces that idea of experimentation. 26% say the employees are experimenting, okay. 21%. They said experimentation is allowed but with clear guidelines on usage. 19% are leaving or have experimentation at a group level, right? So you get all this experiment, experiment, experiment 16%, nobody can use it or they're not supposed to know authorized use only 6% have one or more generative AI use cases in production employment. So that's a very, very small percentage actually in production. So this should help reinforce the idea that if you feel like there's a lot of excitement going on, but it just feels like experimentation. This data is actually showing that that is true, at least based on the chief data officers that were queried for this.
For this survey, those same group of CTOs prioritize our future use cases, customer operations, Support Chat, bots, overall productivity, personal productivity, software engineering, and marketing and sales. Those are the big categories that that echoes what we see when we're dealing with our clients and talk to them where they see the opportunity where they prioritize what they'd like to go pursue. These all fit fit in the same zone.
Last data on who's using generative AI And this is from McKinsey. So again, in that large scale company kind of survey, but still quite useful and helpful, is 88% say that the non technical employees are using generative AI that says that this has moved outside of the IT group, which is 12%. Total, we separated this out 10% of technical employees, and 2% of technical voice or eight or generative AI Jason, just helpful to understand that that they see it as a very small number of people who are adjacent a generative AI versus the total population are using generative, and that's helpful. But it says that this experimentation that's going on is happening outside of the IT group. So this isn't isolated to having programmers and developers and data scientists and analysts trying these things out. They're trialing and they're driving out the usage and the rest of the organization.
Helen Edwards 15:56
So what are the challenges to adoption? Well, the main one is an integration. Because LLM 's have this potential to revolutionize so much of work. But integrating them is really quite challenging. And there's new challenges that need much more human involvement than then is widely recognized at the moment. So there's a lot of counterintuitive things happening, that these very capabilities that make that make them so powerful, create problems that mean that humans have to come back in for that judgment, the expertise, the oversight, one of the most promising use cases, in that that isn't really strongly recognized yet. But as becoming so is the opportunity for generative AI to help with this knowledge management and decision support. You know, we've been working in this area for many years and decision support and having the right information at the right time for decision making in an organization is really complex problem really difficult. And there's a promise that LLM can help with that by, you know, being able to query across multiple datasets in natural language. Now, intuitively, it makes sense, we see good examples of it happening. But it's really challenging. And this data point from MIT Sloan is really fascinating only 11% of data scientists report success and fine tuning large language models with appropriate data. So it's, it's extreme, it's very hard to do. And it's it's quite counterintuitive, in the scenes where it's sort of surprising how big this challenge is, in some ways, until you really think about what it takes to capture and curate that write the correct organizational knowledge. If you don't do it, you risk propagating inaccuracies forward into the future. And even though there's massive amounts of data that has been generated and collected by organizations, only a small fraction is actually suitable for fine tuning or for REG, omit and straight out, it requires a huge amount of human expertise in selecting and managing the data.
Dave Edwards 18:10
So the quick story there is that one of the one of the highlights stories that came out in the early stages of generative AI was Morgan Stanley creating an LLM for their wealth management team. And the idea here was that they would create an LLM framework for wealth managers to be able to query research and come up with good ideas to be able to share with clients. So it's not client facing it was internal. And a lot of you know, it was seemed like it could happen really quickly. And they skip, they spun this thing up really fast. And it was kind of a was an exciting moment, and everything else that sort of similars kind of lag. The interesting part here is to twofold to me, one is the the the effort that it took to actually create this, to create this LM and get all the data in it. It was significant, right? So if you look at there's a there's some interesting quotes from the CIO, of Morgan Stanley's Wealth Management Team talking about the process. And he talked about how you had to, they had to take all of that data and run it through the compliance team, to be able to put it into the LLM. And that required a manual review by a client Compliance Officer of every piece of data. And if you stop and listen and think about that, and you go, wow, that's gotta be a huge amount of work. And it is, I mean, I understand it. I used to be a research analyst at Morgan Stanley used to write those reports and know that the compliance system, so that feels like a lot of work that they must have had to go through in order to comply with regulations. But the flip side to think about it the other way is, Morgan Stanley already has a compliance process. They already have a compliance team. The rules are already set for what it means to be compliant to have a research document that can be passed. published and shared with the public. And but very few other organizations will have that kind of that kind of structure around the rules, the regs, the team, the process that's required to actually vet all of this unstructured data and put it out in the world. And that's a very big difference in terms of the, you know, what is what everyone needs to be able to do, because you're not just going to suddenly suddenly take all your random documents that you've never thought about whether you want to expose to everyone and throw it in an LLM financial services industry clearly has an advantage, because they already have the rules and the process, every other organization has to create those rules and process to be able to process data.
Helen Edwards 20:46
So one of the the the other surprising results here is that there's a new problem of of output verification. And the what what happens is that the it's kind of counterintuitive, until you have spent time designing workflows, and how loopy they can become, but is that one of the findings is that even when language models generate high quality output, so like an a programming task, because of the structure of a of an artificial language, such as you know, a coding language, that the time saved can be offset by additional effort and technical debt required to verify and debug the generated code. Now, this, this is becoming clearer as people start to study more the effect of using these tools in a developer environment, where the initial promise was that less expert novice coders get these, this big productivity, gain and proficiency job acceleration due to using these tools, but then there's this flow on effect, as more expert coders have to debug and fix or as technical debt as increased. So it does suggest that language models aren't necessarily going to lead to the straightforward productivity gains, and can instead transform the nature of work and unexpected ways. So you think of these gardeners done had some some publications earlier this month about people not seeing productivity gains and coding? Well, they are, it's just part of as the expectations are out of alignment. Because even though you might get say, say you take the top end of the range, somewhere between Well, there's up to twice as twice the the code generated and short and the same amount of time 50, to 50, to 200%, sort of thing is the range. But what we're seeing out of that is that if coders are only using these tools, because they're only coding says 16 to 35% of the time, that obviously drops that productivity into that bucket. And then you've got this increase in tasks that are required to debug and check and create different forms of workflow. So we've seen quite a lot of discussion around this whole coding area is being a kind of special leading case, to try and understand some of the complexity and the nature of work. And it's an interest, that's going to be an interesting one to continue to track because it is quite a complicated story. The next thing is the problem of output adjudication. So the you know, as we've all used these large language models, they generate conflicting outputs. And that's based on different prompts or different data inputs. So even the same group of people with the same problem trying to that they're trying to solve, they sample from their own minds and in noisy kind of a noisy way. So you end up with quite different prompts and different data on inputs that push the models in different directions. And it creates this new challenge for organizations, how do you adjudicate between competing viewpoints? And that's kind of hard enough when it's people, but when it's people and machines together, it's the task requires not just technical expertise, but a significant amount of judgment and decision making skills that are acquired through experience and that experience is, is being changed how we gain that experience has been is changing as well. In terms of cost benefit, there's a lot of many good enough tasks that are already automated and the time and cost savings of adopting MLMs for those might just be undone by the tasks or by the other costs they impose. So the code So we've talked about here, but there's also there's a lot of automation that's already happened. That's kind of good enough. And, you know, for our money, a lot of the discussion about using these and these tools and call centers, we put a question mark on for a couple of reasons. One is, that call center has already, you know, highly automated have a lot of scripting, there is always conjecture about just how much customer companies care about increasing customer service quality, as opposed to customer experience. So customer experience is definitely more Greenfield. But in the the just the core foundations of, of course centers and in customer service, there's sort of a question mark, really about just how much adoption we will see there. There's also this factor of being able to put in it comparative advantage. So the that not all resources have some constraints, humans have constraints on an AI has constraints. So the compute and the inference resources of of AI may go to the highest comparative advantage with AI rather than a comparative advantage with a human. We just don't know those things yet the case studies are still working out. The final one here is really about job transformations. And it's that the impacts going to be much more complex and simple replacement. I doubt that's news to any of you. But because many jobs are so dynamic, they have multiple tasks, they're all variable and unpredictable. They're sort of last mile, you know, large language models can do great things to get you 5060 70% the way there. But that last piece is just so much more difficult and complex. There are definitely these tasks that that we just don't even get know how to automate in a way that is not so. So automation, not the kind of automation that calls the human back end for something that doesn't really extend their skills. That leaves the machine still wanting and that there is still an open topic. I'm going to do some question now or at the end. I will
Dave Edwards 27:22
address one question right now is the sort of the question of the Morgan Stanley case study. And whether everyone has to play by the same rules. Yes, and no, the no part is that Morgan Stanley has because of the because it's in a financial services business, and has they have compliance requirements from the SEC and others. So they have a rule set that is defined for their industry, everyone else in the same industry will have to do the same thing. But other companies won't necessarily have to comply with that sort of regulatory requirement. That's both good and bad. I think I think that the advantage for financial services is they have regulations, they they know what they are, you know, those are very clear, they have processes, it's really easy for them, or relatively easy for them to determine whether some information can go into an LLM. So that if the LLM then generates information from that, and from you know, from those research reports, that's all good. Everybody else, so has to make some new decisions about what information can go into an LLM, because you're not sure whether you want the output to come from it. And I think about the worst case scenario is just look at any major tech company lawsuit that's going on. And you know, the press coverage sits there with some number of emails where they're, you know, quoting particular people, I mean, the best or when you know, the Apple lawsuits, and you get emails from Phil Schiller to back and forth with Tim Cook about something, we're then having all of that and then all them sure seems like a bad idea. Okay, so that's easy. And then you've got stuff that's publicly available on your website, that seems like maybe a good idea, although we have some examples of how that can go wrong. So then, but then there's this vast quantity of information somewhere in the middle. Where is it that it's a good idea to make it into an LLM? Whether it's for the for customers, or for employees to be able to have access to inside the enterprise, there's a there's another level of challenge that we see companies going through, they start with saying, Okay, we've got all of this data, fine, which has been decontextualized. And they're trying to hell of an LLM, to be able to help people find things in essentially the rows and columns of data. Then they've got all of this unstructured data they want to bring in, which includes context, which makes it much more interesting and rich. But then you have to make the decision is the decision and the distinction of what of that information should each employee be able to have access to? Because not everybody in the company, pretty much everybody has a different level of information that they're allowed to have access to. That makes sense. hard to figure out what you're going to use, how you're going to train the LLM to be able to have responses that only give responses based on what that employees should be able to have access to. This is a dynamic that we've been really interested in for a while, rewind back and listen to the podcast interview we had with Arvind, the CEO of Gleann, if you're interested in this, because Gleann had a pretty inch as an interesting approach, they gleaned bit premise and the basic structure was being able to have enterprise search across an organization's information and data that respect did the individual employees access, I was sort of a starting premise. We interviewed Arvin, kind of as this whole gender chat window and LLM world was kind of all in the mix. And starting up for them, their approach was to say that they would look at the employee and what they're what they're able to add access to go fetch that information and then generate a response from it. So it's a different sort of way of thinking about where the LLM should sit and what it's being trained on or not. But this is something that's clearly a problem, right? You want to be able to train everything and have this grand LLM. But clearly that's not the LLM is have no ability to filter out what an employee, you know, can or shouldn't have, you know, access to.
Helen Edwards 31:23
That's a hard problem, hard problem.
Dave Edwards 31:26
Thanks for the question.
Okay, so let's talk a bit, we've got some sort of broader society questions to talk about and data that looks at anxiety around AI. So you're not you're not imagining it, survey says that people are getting more concerned. So this is more data from Pew Research. Bottom blue bars are what to look at there, those of those who are more concerned than excited, and wasn't much changed from 21, to 22. But in 2023, a pretty significant jump from 38% to 52% of people are more concerned than excited. And there's a big drop of more excited than concerned from 15% to 10%. So as ChatGPT really started to move across the, you know, across the society as all of the other general AI tools too. But that's where people are much more concerned than the excited are about AI.
Now, there's differences about who's more excited and not. So let's look through some of the some of that data. So leaders are more optimistic than the rest of their organizations. This is something that we find to be a very common pattern across organizations that we work with, is a stable stable from the beginning a decade. And we've seen it from across companies, we see it across higher education institutions. This is from survey data from BCG. So leaders are 62% optimistic 22% concern and there's clearly a, you know, a not an NA or don't, or neither kind of response to that would add up to 100. Managers 54%. Optimistic, frontline 42%. So leaders really excited, they're optimistic, they're less concerned frontline. It's about 5050. In terms of optimistic and concern. This is particularly a problem in our minds. Because we we believe we also see that the best ideas come from those on the front line. So the people who are coming up with the best use cases are the ones who are actually using the tools day to day. But they're also the ones who are using the tools day to day. And they're getting more concerned about what the effect might be on these tools,
Helen Edwards 33:45
or there's more opportunity for Subversion and avoidance. Yeah.
Dave Edwards 33:50
So there's couple other ways of slicing this demographic. There's a handful of slides here from the American Psychological Association, which is quite interesting. So you look at the workers, the percentage of workers who are worried that AI may make some or all job duties obsolete. And this is by age bracket. So you can see obviously, the ones who are the most concerned, most worried are the 18 to 25 and 26 to 43%. And then it declines in as the older groups. Now, that could be because those who are 26 years old, have more years ahead of them to be worried about displacement, whereas those who are above 65 or fewer, it could be that those people or those younger age brackets are also using the tools a lot more we saw that in earlier data. So you see the power of a so you see the power of it. So you actually like feel the effect and you're more worried. We don't really know why but interesting because this is a good way of thinking about across your organization teams like who's who when you walk into a room, how can you expect Who do you think is probably most likely to be using but also who's most likely to be worried, and why?
More data from the APA, the percentage of workers that are worried again, more significantly, those are black and Hispanic 50% are worried that their make some all of their job duties obsolete versus 34% of white people. So definitely much more concern from people of color.
Those who have less education are also more worried. So those who have a high school degree or less 44% of those are are worried that AI will make some or all of their job duties obsolete. Those who have a college degree or beyond, only 34% of those are worried.
Helen Edwards 35:52
Right? And like I said, your your question, do you think the difference in optimism manager versus frontline is due to outsized expectations push from the top or the inadequacy of the real working use cases having to face the height? Bottom? I think the answer is a bit of both. And I think it also depends quite a lot on the context in which the the industry is essentially, for example, in education. One of the big explanatory factors for this difference is that essentially instructors and teachers are completely exhausted after the pandemic, and tackling generative AI and Chechi beauty in the classroom. And what have you, is just so hard, because there's so tired and overwhelmed, it's an yet another thing to worry about. And we've been hearing that for a couple of months. But a new report out of MIT education group is just so clear that this is really fundamentally what's driving the difference in education, that people are, you know, burned out, essentially. And I am surprised that we don't see this in medicine. So in health, the adoption rates are really high on the front line, because doctors are essentially, you know, using these tools individually, to answer patient queries to emails. So there's this inkling, there's there's a real context dependent adoption rate in the front line. And a lot of it is, you know, how much can you get your head around and use for you personally, like, there's more instructors using chat GPT, for doing lesson planning, and marking, then there is for completely reinventing their entire curriculum, for example, the same as there's more doctors using chat CBT, just for themselves, then there is to sort of across a broader practice area. So it is context dependent, and it sits at this intersection between how the individual users are and how the group is using it for larger scale problem solving, and how kind of like what is this instant productivity hit that I can get that makes my day easier? I just don't have time to think about some of the bigger picture.
Dave Edwards 38:26
Some more. Let's look at a couple other things here. This I find this interesting that this this APA question was around the people's what they think about monitoring, essentially surveillance. And although that's not necessarily directly a question about AI, it basically is you can infer some relationship to AI because AI has some level of monitoring going on. So there's three different questions that they asked. One was whether monitoring improves productivity. Second is whether it improves workplace experience. And the third is whether it protects my safety. And you can see from the individual contributor through to upper management, that upper management is definitely much more interested in monitoring. And they actually think that it protects safety a lot more protects work, improves workplace experience and improves productivity. So it's a big Delta there, what people are actually on the front, the individual contributor and frontline worker what they think versus what upper management thinks.
This is data from BCG around the expectations 89% of executives believe Gen AI will create new roles 74% believe Gen AI will require significant change management 40% 6% of workers are expected to need to be rescaled in the next three years. No wonder people are anxious. It's a colossal right that's a huge change. that leaders are out there talking about and professing. And if you take a moment, or it's about 15 minutes to watch the Jon Stewart thing from The Daily Show about AI this week, you'll notice the quotes at the end that he has, from the great, you know, the big tech leaders who are all saying that you know, who will eventually admit that generative AI is a labor replacement technology, this shows some of that, that expectation is that this will be quite a lot of disruption. And that's quite a lot of challenge and problem.
Helen Edwards 40:29
And if you think about the US context, it was actually displayed quite well on the John Stuart clip, but it's something we've looked at over the years is, there just isn't, there isn't a history in the US of retraining, much more so in Europe, much more so in other parts of the world. But it is completely rational to be highly skeptical and cynical of desire, desires to retrain workers. And for companies mean that you know, people there's a degree of cynicism around this that I think drives some of the anxiety as well.
Dave Edwards 41:04
Interestingly, when you get to this level, this is again, BCG. The vast majority of of leaders that they that they surveyed say that they're not expecting any change to employment. So that's an 87% at the bottom. And it's split 6% Either way of companies expecting an increase of employment and a decrease of employment. Not really sure what to think about all that. But it's interesting just to see, that's where that people are thinking, huge change, new roles, some ups, some downs, a lot, no change, we'll see how this maps out, I think that these numbers will likely be more dynamic over the next year or two.
Helen Edwards 41:45
I'll close with this wonderful quote, The optimistic idea that lower level employees will be empowered by access to large language models to take on more of the tasks of high level employees requires particularly heroic assumptions. It's a it really is quite, there's a lot packed in a lot of meaning packed into that quote. And I think that we're with this entire process of adopting different contextually aware intelligence and to jobs. It will, it is going to be quite transformative. But it's it's very, very early days. And I think one of the things that really struck us about this month's pulling together this data is just how confused the situation kind of is and how many different poles pushes and Poles there are, from a highly theoretical standpoint down to empirical studies that still need to be strengthened and, and, and expanded and scaled. But we're very much in an a very early stage for this. I'm just
Dave Edwards 42:55
going to read off a question here, because I think it's interesting, and I'd like to tap into now says, I understand many predictions show the high knowledge jobs get replaced first and understand it for jobs like coaching and social media, media marketing, which don't necessarily require training on private corporate data. Beyond that, though, I now wonder if the knowledge capture could challenge plus, the rapid dev of human like robots changes the prediction about which jobs are more most likely to be fully replaced. As many in person, light labor jobs don't require as much access to specialized data. have any thoughts on this? Quite a lot of thoughts. Actually, several years ago, we did, we did a deep dive into the BLS data us to the Bureau of Labor Statistics data that were a lot of these. Basically, all of these predictions about different jobs and whether they're going to be replaced or not, they all start with this BLS data that looks across all the entire economy and every kind of job and what you have to do and not do, etcetera. And we did this several years ago, when the original Oxford study came out, and was talking about 47% of jobs being somewhat automated, it got misquoted a bunch, and it's that it was going to be replaced. But it was basically a job disruption of 47%. We looked at that, and what though a couple of things that are important here that come out. One is what we're seeing actually affirms with what we predicted several years ago, which is that some portions of jobs can be accelerated, or augmented or replaced by AI. But there's too much else that people do to actually remove a role. And so that's what's happening a lot right now a knowledge worker, I think that's what we're seeing in terms of people who say, Well, yes, you can accelerate some coders tasks because they can do it faster. That doesn't mean that you can replace the coder because there's so much else that they have to do. There's also all kinds of other people that you have to work with. There's you know, creating the code can be faster. But it doesn't mean that you can check it into the code base and move through your agile sprint faster, because there's so much else that has to happen across an entire team. There's also this stuff of like, well, I might be able to write an email faster, or put up a social media post faster. But then there's everything else I do with my job that can't be replaced. So we're starting to see how people are looking at recrafting workflows, which sounds good. But that's really part of where their goal is, is to recraft workflows to be able to reduce number of headcount. Because saving an industry, if you look across the entire company, and everybody saves 5% of their time, it doesn't mean that the that the organization is getting any value out of it, unless people are using that 5% For something additional value, they have to be able to do more with that 5%. Otherwise, everybody's just saving 5% of their time, because you can't get rid of 5% of people. The flip side on the robots. What we found in the past, and I think it's still true is that humans are remarkable what we can actually do physically. And the best reflection that we had was actually spending a bit of time with one of the true gurus of a pot robotics at Berkeley. And, and the having him explain the challenge of creating a robot that can actually do what a human can do, especially with our hands, the fact that we can pick up a glass, that's light and not break it, but then pick up a heavy thing and not drop it, you know, those kinds of responsiveness is and the fact that how hard it is for robotics to figure that out. So we're seeing some level of like, really interesting robotics. But it's also what we're seeing is the success stories. We're not seeing everything else that the robotics world is finding too difficult to accomplish. Yeah,
Helen Edwards 46:56
at least it's an important question, because there has been significant advances in robotics in the last couple of years, much more so than in the previous years before, partly because there there are now data sets that are that, that basically allow robots to tap into more spatial reasoning via language, there's essentially one way to think about it, and that it's a complex story. But nevertheless, there is definitely more progress in robotics, then. Based because of the language models, and there has been in the past. Having said that, just to go to the theory, the the people who worry the most and spin and have their careers devoted to thinking about labor replacement. People like David Autor at MIT. His current thinking is, is really quite interesting on this, which is that we're facing not a shortage of jobs. We're facing a shortage of workers. And in our conversations in the last couple of months with quite large infrastructure driven, labor driven workplaces is they concur with that, that the demographics of the aging workforce, the fact that there are a bunch of people that don't want to do like labor or manual labor, that that's actually changing faster than the automation is changing. And so the lack of workers for those roles, is actually driving the development of automation for those roles, more than we would have thought 10 years ago when it was all about just is it economic, to replace humans with machines? So I think, again, the story is quite complex and nuanced, because at the same time, what's happening in these roles is a degree of, of delamination of jobs. So the poster child for delamination of jobs is the creation of physician's assistants, which put a slice of of lower expertise and between the physician and Anna and a nurse, essentially, and that delamination allowed for a creation of, of a new class of workers sort of a upper middle class type of of skill set. And the theory is that a lot more of that will happen, as, as all artificial intelligence, robotics included, allows for less skilled people to take On more expert level roles, and D laminate more jobs. Now there's a theoretical argument that you can point to a couple of examples in the past. And in some ways, it's a, it's a hopeful argument, because it allows, it kind of closes the loop between, we don't have enough workers, but we have workers that want to do more things that meaningful work the, you know, not suppression of labor, of wages. That's a more hopeful story. But it's very nascent, and it's very early. So it's a it's a really fascinating picture to just keep, and it takes years to try and sort of follow all of these quite theoretical economic arguments.
Dave Edwards 50:44
So I'm going to jump into three things, three examples of out in the wild, just in the last couple of minutes, we have three things that caught our attention.
First is the New York City government created a chatbot to answer people's questions about rules and regulations in New York City. And it did it does a pretty lousy job of sticking to the facts. This is some there's a great expose a work combined by the markup and the city that NYC you can see things like top left is asking about landlords and do they have to accept tenants on rental assistance or accept section eight vouchers and like in the chat bot says no. The reality is that landlords can't discriminate by source of income. Can I take a cut of my workers tips? The chatbot says, yes, you can. But the reality is bosses can't take any tips. Do I have to inform staff about schedule changes? No. But the answer is yes. Can I make my store? Cashless? Yes, you can. No, stores have been required to accept cash as payments since 2020. This is Yeah, it's a great example of a problem, right? These these are all defined, well known and, you know, information. So much so that the journalists were able to find the inaccuracies. But the chat bots clearly not sticking to the facts.
Helen Edwards 52:12
And it's so interesting, because there's a there's a there's a feature of this, what's tested, which is called sycophancy, and it depends on the way you ask a question, right? So, you look at some of the bias, and do I do I, you know, do I have to inform staff about federal changes, you know, you can imagine that received by the large language model and the tone of off to I have to? And whereas if you said, how should I best inform my or, you know, you can think about different ways of framing those, those those queries where the language model is more incentivized to come up with a more truthful answer. So it's the these, these, it's not just that they hallucinate, it's just they follow the intent of the question of the prompt. And that can skew the way it's answered as well.
Dave Edwards 53:04
Second one, the Washington State Lottery created an app called Test Drive a win where you could say, This is my dream, if I win the lottery and woman was said that her dream was to be able to go swim with sharks and the app asked her to upload a photo of herself. And it created a photo that was like, Hey, this is you after winning the lottery, and it was her on the beach and her face on top of a topless body. So clearly a problem that should have been something that guard rails could prevent, but clearly it didn't. That was a problem.
And then last one is one of the big tech. So Google, their search generative experience, which is something you can access to search, through Google Labs, is has been shown to be propagating a bunch of spam results. So when you're searching for in the left, used auto parts on Craigslist or Pitbull Puppy for sale on Craigslist, spammers have figured out how to get insert their, their links into something that people can actually that will generate as a result in the generative search experience. And you can see when you look at the URLs here, these are clearly spam results. But they're coming through as as results through SGX. Anyway, thank you so much for joining us. And we've just gone a little over time. We will continue we'll be doing this again next month. I can't remember exact date, I think it's the week of May 12. And next topic will be a agentic AI so less data and more talking about what's actually going on when we're thinking about developing agents, which are basically it was a way to use AI to go do a multi step process. So that'll be our deep dive theory right now. Cool, and let us know what you thought. We'd love to hear from you. You know how to get in touch with us. Please do and thanks for being with us.
Helen Edwards 54:57
Thank you
Transcribed by https://otter.ai
The Artificiality Weekend Briefing: About AI, Not Written by AI