Is there an AI workslop problem?
The debate about whether large language models are transformative or overhyped occasionally produces real-world data. These inevitably arrive with eye-catching headlines (e.g. MIT report: 95% of generative AI pilots at companies are failing) built on shaky methodology (based on 52 structured interviews at conferences, analysis of 300+ public AI initiatives and surveys with 153 leaders). There’s something to be learned here, but the headline isn’t it.
The latest example is a Harvard Business Review article AI-Generated “Workslop” is Destroying Productivity. Workslop is “AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task”.
The authors offer some striking numbers. Forty percent of survey respondents had received workslop in the last month. Fifteen percent of the work they receive is workslop. Add in the nearly two hours each respondent spends dealing it and you get a cost of $186 per employee per month, or $9 million annually for a 10,000 worker organisation. (That 10% of respondents said receiving workslop made colleagues seem more creative, capable, reliable, trustworthy and intelligent also suggests some survey respondent slop.)
So how did they reach these conclusions? The survey (available online when I wrote this) drew responses from 1,150 US-based full-time employees.
The first substantive question asks:
Have you received work content that you believe is AI-generated that looks like it completes a task at work, but is actually unhelpful, low quality, and/or seems like the sender didn’t put in enough effort?
It could have appeared in many different forms, including documents, slide decks, emails, and code. It may have looked good, but was overly long, hard to read, fancy, or sounded different than normal.
In the last month, in general, how much of the work you received from colleagues fits this description?
Respondents then report who sent the workslop and how much time they spent dealing with it in the past month. From this, the authors estimate workslop’s frequency and cost.
This is a problem. To estimate cost we need the counterfactual. What would the respondents have received without AI? I have received plenty of human-generated workslop over the years and spent considerable time fixing it. I would certainly clock more than two hours per month on that.
The AI workslop might actually be an improvement. People using AI to generate slop would likely generate slop anyway.
The methodology has another gap in that it only captures poor AI work. The study measures “bad things I noticed” whilst ignoring “good things I benefited from”. Without both, productivity estimates are meaningless. What about asking “Have you received AI-generated work content that is helpful and high-quality?” Or “How much time have you saved in the past month through using AI?”
Include both estimates and the story might flip to AI’s productivity benefits. Identifying AI-generated workslop shows there’s room for improvement (as does seeing human-generated workslop). But you need to measure both costs and benefits to measure whether generative AI delivers net value. Without the counterfactual, the study hasn’t measured productivity destruction at all.