AI Narrows Performance Gaps — But Does That Mean It Reduces Inequality?
What the most-cited experiments in AI economics can and cannot tell us about inequality
TL;DR: Several influential experiments have found that AI disproportionately helps less experienced and lower-performing workers, compressing the performance distribution within a given task. These studies have led to considerable optimism about the potential of AI to reduce inequality. The findings are robust and important. However, we think the broader conclusions often drawn from these findings go beyond what the experiments were designed to show. The experiments, by design, hold the set of tasks fixed, so they compare workers doing the same thing. They cannot say much about inequality between workers in different occupations, or between juniors and seniors who typically do different tasks. They also capture mostly augmentation—workers being helped by AI—but can say less about automation, where AI replaces the worker altogether. These omissions matter, and we think they warrant caution in how we interpret the experimental evidence.
The great equalizer narrative
If you follow the debate on AI and the labor market, you have probably encountered a reassuring narrative: AI is the great equalizer. When workers get access to AI tools, the less skilled, less experienced ones benefit the most. The technology lifts the bottom of the performance distribution while leaving the top roughly where it was. This is the headline finding of a growing body of well-identified experimental work—Brynjolfsson, Li, and Raymond (2025) on customer-support agents; Noy and Zhang (2023) on professional writing tasks; Dell’Acqua et al. (2023) on management consultants; and Cruces et al. (2026) on education-based productivity gaps, among others. The results have been widely interpreted as evidence that AI will reduce economic inequality.
We find these studies very insightful and their findings are genuinely important. But we also think the leap from their results to conclusions about inequality is bigger than it might seem. The experimental designs, almost by construction, leave out several channels through which AI is likely to affect the distribution of economic outcomes. This post is an attempt to highlight some of those channels and why they matter.
What the experiments show
The pattern across these studies is remarkably consistent. Brynjolfsson et al., studying over 5,000 customer-support agents, find a 30% increase in issues resolved per hour for less experienced agents, while the most experienced workers see small gains in speed and small declines in quality. Noy and Zhang find that ChatGPT raises output quality by 0.45 standard deviations for college-educated professionals doing writing tasks, with significantly larger gains for workers who scored lower initially. Dell’Acqua et al., in a lab-in-the-field experiment with Boston Consulting Group consultants, report quality improvements of about 43% for initially lower-performing consultants versus 17% for top-half performers. And Cruces et al., comparing adults with high school versus postsecondary education on the same business problem-solving task, find that AI closes about 75% of the baseline productivity gap between education groups.
These are real, well-identified results. The question we want to raise is not whether they are correct, but whether they tell us what we often take them to tell us about inequality more broadly.
But workers don’t all do the same thing
The key design feature these studies share is that everyone does the same task. In Brynjolfsson et al., Noy and Zhang, and Dell’Acqua et al., the comparison is across workers of different skill or experience levels within the same occupation performing the same work. Cruces et al. go meaningfully further by comparing across education groups, but even there, all participants perform the same business problem-solving task. The experiments tell us that when workers of different ability levels do the same thing, AI helps lower-performing or less educated workers more.
But in the real economy, workers don’t all do the same thing. Different occupations involve fundamentally different tasks, and AI’s capacity to boost productivity varies enormously across them. Some tasks fall squarely within what current AI systems do well—drafting text, writing code, summarizing information—while others are barely touched. If AI dramatically boosts the productivity of high-paying knowledge work while leaving lower-paid service occupations largely unaffected, between-occupation inequality could widen even as within-task inequality narrows.
This isn’t just about differences across occupations: the same logic plays out within occupations once you consider seniority. Junior and senior workers in the same profession typically do different tasks. A first-year associate at a law firm drafts memos; a partner manages client relationships and develops legal strategy. A junior analyst cleans data and builds models; a senior analyst interprets the results and advises clients. The tasks juniors tend to do—more routine, more codifiable, more text-based—are generally more exposed to AI than what seniors do. So an experiment that gives juniors and seniors the same task and finds that juniors gain more is not capturing the real-world dynamic in which the tasks juniors typically perform may be the ones most susceptible to displacement.
Some evidence is starting to bear this out. In our recent work (Hosseini and Lichtinger, 2026), we find suggestive evidence that predicted productivity gains from generative AI (based on Eloundou et al., 2024) are bigger for high-expertise occupations. In a separate paper (Hosseini and Lichtinger, 2025), we document that after firms adopt generative AI, junior employment declines sharply relative to non-adopters, while senior employment is largely unaffected. The “Canaries” paper by Brynjolfsson, Chandar, and Chen (2025) also finds similar patterns. In other words, the technology might benefit more senior and more skilled workers, even though the experimental literature suggests the opposite within a fixed task.
Augmentation is not the whole story
There is another limitation that is almost structural to the experimental approach. In each of these studies, workers are given AI as a tool to help them do their work. The worker is still doing the task; AI is assisting. This is augmentation. But the other way AI can affect labor markets is through automation—firms deciding not to assign a task to a worker at all, because AI can handle it directly (e.g., Acemoglu and Restrepo, 2018).
By construction, experiments that hand workers an AI tool and measure their output cannot capture this. You will not see, in a randomized trial, the firm’s decision to eliminate a position, to not open a role, or to shrink a team because AI makes some of the headcount redundant. Yet this is arguably the channel with the largest potential effects on inequality. If AI automates tasks previously done by lower-skilled workers, the productivity gains for remaining workers—however equalizing they may be—are beside the point for those who no longer have the job.
The labor supply side: a reason for cautious optimism
Finally, there is an additional dimension that is often missing in this discussion: the labor-supply side. Most discussion focuses on how AI changes the productivity of workers already in an occupation. But AI can also change who can enter an occupation. If AI lowers the expertise required to perform certain tasks, it could erode the entry barriers that protect incumbents in high-skill occupations, compressing wages by increasing competition for positions that were previously shielded by high human-capital requirements. As introduced by Autor and Thompson (2025), expertise functions both as a source of productivity but also as a barrier to entry, and AI may affect both dimensions simultaneously.
In Hosseini and Lichtinger (2026), we try to model this. We develop a Potential Supply Shift index that tries to predict how much AI will reduce the expertise barriers to different occupations, and embed it in a general equilibrium framework. We find that the productivity channel and the supply channel can push in opposite directions: predicted productivity gains tend to widen wage disparities, while predicted entry-barrier erosion compresses them (see also the very interesting work by Althoff and Reichardt, 2026). The net effect on inequality depends on which channel dominates, and the answer is unlikely to be uniform across the economy.
Summing up
To be clear: we are not dismissing the experimental evidence. The finding that AI compresses within-task performance distributions is robust and well-replicated, and it tells us something real and important about how the technology will affect the labor market. But the question of how AI affects overall inequality requires thinking about channels that these experiments were not designed to capture: the fact that occupations and seniority levels involve distinct tasks with varying AI exposure; that automation, not just augmentation, reshapes labor demand; and that the supply side of the labor market matters too.
The real picture is almost certainly more complex and more uneven than the headline narrative suggests. AI may well reduce some dimensions of inequality. But whether it is, on net, an equalizing force remains a genuinely open question—one that the current evidence, important as it is, does not yet answer.
References
Acemoglu, Daron, and Pascual Restrepo. 2018. “The Race between Man and Machine: Implications of Technology for Growth, Factor Shares, and Employment.” American Economic Review 108 (6): 1488–1542.
Althoff, Lukas, and Hugo Reichardt. 2026. “Task-Specific Technical Change and Comparative Advantage.” Working Paper.
Autor, David, and Neil Thompson. 2025. “Beyond the Race between Education and Technology.” Working Paper.
Brynjolfsson, Erik, Danielle Li, and Lindsey Raymond. 2025. “Generative AI at Work.” The Quarterly Journal of Economics 140 (2): 889–942.
Brynjolfsson, Erik, Bhavya Chandar, and Rui Chen. 2025. ‘Canaries in the Coal Mine?: Six Facts about the Recent Employment Effects of Artificial Intelligence.’ Working Paper.
Cruces, Guillermo, Diego Fernández Meijide, Sebastian Galiani, Ramiro H. Gálvez, and María Lombardi. 2026. “Does Generative AI Narrow Education-Based Productivity Gaps? Evidence from a Randomized Experiment.” NBER Working Paper No. 34851.
Dell’Acqua, Fabrizio, Edward McFowland III, Ethan R. Mollick, Hila Lifshitz-Assaf, Katherine C. Kellogg, Saran Rajendran, Lisa Krayer, François Candelon, and Karim R. Lakhani. 2023. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality.” Harvard Business School Working Paper No. 24-013.
Eloundou, Tyna, Sam Manning, Pamela Mishkin, and Daniel Rock. 2024. “GPTs Are GPTs: Labor Market Impact Potential of LLMs.” Science 384 (6702): 1306–1308.
Hosseini Maasoum, Seyed Mahdi, and Guy Lichtinger. 2025. “Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data.” Working Paper.
Hosseini Maasoum, Seyed Mahdi, and Guy Lichtinger. 2026. “Generative AI and Occupational Entry Barriers: The Labor-Supply Channel of Technological Change.” Working Paper.
Noy, Shakked, and Whitney Zhang. 2023. “Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence.” Science 381 (6654): 187–192.



Excellent post, thank you. I think the selection effects that you present are important. I have two additional angles on (un)equalization that might be of interest and offer further support for being somewhat cautious of expecting the democratization effects found in a range of studies, to generalize to more messy, real-life situations.
One angle is related, just framed differently: 1) You rightly show that a bunch of studies have shown democratization effects - but these generally take departure in fairly well-defined problems. In our recent Academy of Management Learning & Education paper, we argue that the distinction between well-defined and ill-structured problems is likely relevant. In a well-defined problem LLMs are a) more likely to get answers right, and b) it is easier to identify if the answer is correct or not. In ill-structured problems, it is more likely that a user has to spend more effort on engaging with and iterating on the answer. 2) This links to the second angle and the equalization findings we find in our experiment: low-performers improve, while high-performers do not (actually decline) - thus, the effects are democratizing, but not in the way we one would hope. We use cognitive load theory to explain the mechanism (see https://journals.aom.org/doi/10.5465/amle.2025.0029): While low-performers know little and thus benefit from getting something - high-performers already know something and now get even more information (extraneous load), which can be metacognitively challenging.
Hence, I suspect that while one might see positive effects in well-designed, limited studies, the typical user is often in more messy situation, potentially getting a lot of information from LLMs, and then struggling to integrate this - in particular if the user already has relevant information.