AI Coding Tools Can Reduce Productivity: Study Results

Many tech leaders claim that AI won’t replace developers, but rather enhance their productivity, allowing them to complete more tasks. However, new research from METR suggests that this may not always be the case. A blog post on Second Thoughts delves into the reasons for this discrepancy.

AI-powered coding tools are generating immense buzz. Startups are launching with minimal engineering teams, non-programmers are “vibe-coding” entire apps, and the job market for junior developers is tightening fast.

But according to a rigorously controlled study by METR (conducted in the spring of 2025), there’s at least one group for whom AI tools didn’t help at all: experienced developers working on large, mature codebases. In fact, the research revealed a 19% drop in productivity. Ironically, the participants believed AI had made them 20% more productive.

If you take away just one thing from this study, it should probably be this: when developers report that AI has sped up their work, they might actually be wrong!

Study Methodology

The experiment was tightly designed as a real-world randomized controlled trial. Here’s how it worked:

16 experienced developers from major open-source projects participated.
Each selected a set of real “to-do” tasks from their daily workload, broken into 1–2 hour chunks (246 tasks total).
Developers estimated how long each task would take with and without AI.
Tasks were randomly assigned to two groups: “AI Allowed” (developers could use any tools) or “AI Disallowed” (no AI assistants permitted).
Developers recorded their screens while working, and logged actual time spent after each task.
For AI-allowed tasks, they also estimated how much time they believed AI had saved them.
Crucially, developers didn’t know beforehand whether AI would be permitted for a task – eliminating bias in time estimates.

When researchers compared estimated vs. actual times, they found a 19% larger discrepancy for AI-allowed tasks compared to non-AI tasks – indicating lower actual productivity.

Why Did This Happen?

The researchers explored a wide range of potential factors behind the unexpected results. Here’s what they found:

John Henry Effect – Developers might have tried harder on non-AI tasks to “beat the machine.” But if that were the case, the effect should have faded over time (each participant completed around 15 tasks). No such trend was observed.
Insufficient AI usage – It’s possible some participants underused AI, even when allowed. However, this would only explain a lack of improvement, not a drop in performance. And in fact, 84% of screen recordings in the “AI allowed” group showed at least some AI use.
Overly optimistic time estimates – Developers often underestimate how long tasks will take. But because these estimates were made before knowing whether AI would be allowed, the bias would apply equally to both groups.
Task ambiguity, outdated tooling, or inaccurate time tracking – None of these meaningfully skewed the results.

None of these factors skewed the results. However, there are several factors that might have contributed to the observed decrease in productivity:

Overuse of AI – Some participants admitted they experimented more with tasks when using AI, even applying it where it wasn’t needed. This applied to nearly half of all tasks.
Lack of experience with AI tools – Only 44% of participants had extensive experience with tools like Cursor.
Trying unfamiliar tools – Some participants used new AI tools they hadn’t worked with before, which likely slowed them down.

It’s possible that developers using AI simply expanded the scope of their tasks — for example, writing code to handle additional edge cases, adding new features, or spending more time on testing or documenting their code. As potential evidence of this, the researchers point out that in tasks where AI was allowed, developers wrote 47% more lines of code (relative to the expected task size) compared to tasks without AI. However, the authors of the study believe that even if this difference in lines of code is real, it can be interpreted both positively (more detailed code, better validation) and negatively (excess code, duplication, unnecessary details, etc.).

Even though developers spent more time with AI, this doesn’t necessarily mean they exerted more effort. Reviewing or fixing code is often (though not always) easier than writing it from scratch, and the waiting time for code generation can be used for resting or multitasking.

In conclusion, the impact of AI tools might not be as bad as it seems: part of the observed productivity drop (-19%) could have been due to more thorough work or less mental effort. The drop may also have been influenced by participants who over-experimented with AI, taking the study too seriously. But the fact remains: AI tools didn’t significantly help, and may have even hurt productivity.

The Weak Spots of Today’s AI Tools

Mature, large-scale codebases. These projects were over 10 years old and had millions of lines of code. AI struggles to operate effectively at this scale.
Tacit knowledge. Developers rely on undocumented understanding of the codebase. One participant said AI “acts like a newcomer to the repo.” Another noted it “doesn’t know which data interacts where or why this line matters for backward compatibility.”
Highly skilled developers. Participants weren’t average coders – they were seasoned experts. Competing with them is a tall order for AI.
Strict coding standards. Most open-source projects studied had rigorous style guides. AI didn’t always follow them, leading to extra manual revisions.

Final Takeaways

The observed 19% productivity decline might seem discouraging – but it occurred in a tough environment: skilled developers, complex codebases, high quality demands. Some of the slowdown might reflect more thorough work or less mental strain. Some was due to overuse of AI during the study. And of course, results are likely to improve over time.

This study shouldn’t be seen as a refutation of the idea that AI could drive explosive growth in software development by 2027. Rather, it suggests that the strongest feedback loops in AI evolution may still be further off than we expected — especially when it comes to large, complex codebases, as opposed to smaller side projects where AI tends to perform better.

Developers were 19% slower with AI, but thought they were 20% faster. This study offers a rare dose of objective data in a space often dominated by hype and gut feeling.

Don’t forget to share this article

Don’t want to miss anything?

Subscribe to keep your fingers on the tech pulse. Get weekly updates on the newest stories, case studies and tips right in your mailbox.