When Knowledge is the Bottleneck: A Knowledge-Centric Interpretation of METR AI Productivity Study

Introduction

Modern AI tools promise to speed up software development. We’ve all seen demos of code autocomplete and AI-powered refactoring that make typing look nearly instant. Yet a recent METR field study “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” found the opposite: experienced open-source developers actually took 19 percent longer to finish tasks when they used modern AI agents.

What’s going on? To answer this question, we suggest adopting the Knowledge-centric Perspective on software development. This means seeing knowledge as the fuel that drives the software development engine. Central to this perspective is the concept of the 'knowledge gap' - the difference between what a developer knows and what they need to know to effectively complete tasks. Every feature, bug fix, or code review requires a developer to bridge the gap between what they already know and what they need to learn to complete the task. Thus, real software development work is more than keystrokes — it’s about closing knowledge gaps.

This view is treating GenAI as a source of knowledge — rather than a magic typing machine — which explains the study’s surprising findings.

In this article, we show that AI is valuable only if it helps fill in missing knowledge. In scenarios where AI provides the right facts or examples, developers gain speed. But when AI lacks the needed deep, project-specific context, it can actually slow them down.

We’ll see where AI shines and where it stumbles, and outline how to use it wisely to close the right gaps.

1. The Knowledge-Centric Perspective

Every software task—whether it’s building a new feature, fixing a bug, or reviewing a pull request—is a journey. The developer starts at Point A, armed with what they already know: their experience, mental model of the system, and familiarity with the tools. To complete the task, they must reach Point B—the place where they understand exactly what to do and how to do it.

The distance between A and B is called the knowledge gap. Sometimes the gap is small: “What’s the right parameter for this function?” Other times, it’s big: “How do we update this legacy module without breaking compatibility?”

Getting from A to B isn’t just about typing. It involves reading documentation, asking teammates, thinking through trade-offs, sketching designs, and running experiments. This invisible, behind-the-scenes work is part of what we call the Knowledge Discovery Process - in which invisible knowledge (tacit insights, undocumented best practices) is transformed into visible, tangible output. It’s what developers spend most of their time doing — even if it doesn’t show up in the diff.

From this perspective, productivity isn’t about speed of output — it’s about speed of understanding. A developer who gets to the right solution faster, with fewer detours, is more productive — even if they type at the same pace.

Viewed from the Knowledge-centric perspective, GenAI is a source of knowledge. It competes with prior knowledge, StackOverflow, Google Search, books, and advice from colleagues.

What GenAI fundamentally does is aid developers in efficiently bridging the knowledge gap between what they know and what they need to know to effectively complete tasks.

Unique to GenAI is the interactive and context aware manner it helps developers bridge the knowledge gaps. GenAI's real-time suggestions, based on the current coding context, provide a more seamless and integrated experience compared to searching for answers on StackOverflow or Google. This can lead to a more fluid development process, with less interruption to the developer's workflow. GenAI's understanding of the developer's current project and codebase allows it to offer more relevant suggestions than generic search tools or even other AI-based tools that might not have the same level of integration with the development environment. In this way, GenAI might contribute to a flatter learning curve for developers by exposing them to best practices and new coding patterns in their work context, as opposed to the more passive learning that might occur when consulting documentation or forums.

This is where GenAI tools have the potential to help — but only if they reduce the time it takes to close the knowledge gap. If the tool helps the developer learn something useful, surface the right function, or explain an unfamiliar concept in context, it adds real value. But if it offers shallow suggestions that miss the deeper meaning of the task, it may waste time rather than save it.

In short: AI’s real promise is not typing faster — it’s learning faster. And to deliver on that promise, it must target the right kind of knowledge gap.

2. GenAI as a Source of Knowledge

If we treat AI as a source of knowledge, then its usefulness depends on this comparison:

Who has more relevant knowledge about the task: the developer or the AI?

So, if the developer already has more or better-suited knowledge, then:

The AI can’t add much value.
Worse, it may introduce noise—irrelevant or incorrect suggestions that require time to review, correct, or discard.
This results in net friction, not net support.

In contrast, if the developer lacks key knowledge, then:

The AI may fill in helpful gaps (e.g. how to use an unfamiliar API or write a specific test).
In this case, the knowledge gap is real, and AI can help bridge it efficiently.

GenAI tools often lack the tacit, context-specific knowledge needed for effective assistance in mature codebases.

3. When GenAI Slows You Down: Tacit Knowledge and Hidden Context

One of the most surprising findings in the study was this: experienced developers were slowed down the most when using GenAI — especially on tasks they were already familiar with. From a Knowledge-Centric Perspective, this makes perfect sense.

When a developer already holds deep, project-specific knowledge, they don’t need much help bridging the gap from problem to solution. In fact, bringing AI into the process can add friction rather than value. The core principle is this:

When the developer’s tacit knowledge exceeds the AI’s, the AI becomes redundant—or even counterproductive.

Why Tacit Knowledge Defeats the Model

Symptom of slowdown	Underlying knowledge mismatch	Supporting evidence
Irrelevant or misplaced edits	AI lacks unwritten architectural “maps”—namespace conventions, file boundaries, legacy constraints	“AI often acts like a new contributor … doesn’t pick the right location to make the edits.”
Missed runtime edge-cases	Model cannot infer operational quirks encoded only in senior engineers’ heads	“It doesn’t know we need to take care of this weird case of backward compatibility … and this is very hard to give as context.”
Redundant advice for experts	Developer’s tacit knowledge already exceeds AI output quality	“If I am the dedicated maintainer of a very specialized part of the codebase, there is no way agent mode can do better than me.”
Excess time spent triaging AI code	Reviewing, cleaning, or reverting low-context suggestions eclipses any typing speed-up	Devs spent ~9 % of their time just reviewing/cleaning AI output; 75 % read every generated line and 56 % report major rewrites.
Greatest slowdown on familiar tasks	When prior-task exposure is high (small explicit gap), AI adds overhead instead of insight	Developers took longer to complete issues they had high prior exposure to—meaning they already knew what needed to be done. AI didn’t help them move faster, and often slowed them down. Issues with high prior exposure showed the largest slowdowns (Figure 7 in the paper).

Mechanism in Knowledge-Centric Terms

Invisible gap: The “missing” information lives only in developers’ heads, past decisions, or scattered tribal lore.
Unprompt-able: Articulating all of that context in a prompt is infeasible (“very hard to give as context”).
Verification overhead: Every AI suggestion must be audited against hidden constraints, offsetting any initial speed advantage.

Hence, more knowledge ≠ faster with AI; the richer your tacit model of the system, the more noise AI introduces.

Why This Happens

Tacit knowledge — like design rationale, naming conventions, and subtle dependencies — is not written down. It lives in the minds of experienced developers and in years of collective team habits. AI tools don’t have access to this hidden layer of meaning. Even the best model can’t use knowledge it doesn’t know exists.

Strategic Implications

Do	Don’t
Document key constraints—surface hidden compatibility rules and naming idioms so AI can “see” them.	Expect GenAI to infer unwritten rules or decades of institutional memory.
Use AI selectively—turn it off on modules you personally steward or that embody heavy legacy baggage.	Blanket-enable AI for every task; indiscriminate usage yields review fatigue.
Treat AI output as a junior draft—budget review time proportional to the depth of tacit context involved.	Assume autocomplete speed equals productivity; the real bottleneck is knowledge validation, not keystrokes.

By recognising that GenAI is a conditional helper, teams can avoid the slowdown trap: deploy AI where the knowledge gap is explicit and bridgeable, and rely on human expertise where tacit context reigns.

Strategic Takeaway

More expertise doesn’t make AI more helpful — it often makes it less useful.

When developers are working in areas they know well, GenAI adds overhead:

It can distract them with off-target suggestions.
It requires review and cleanup that wouldn't be necessary if they coded directly.
It doesn’t understand the “why” behind many design choices.

For managers, this means GenAI may be least effective when used by your most experienced developers on familiar systems. In these cases, the cost of misunderstanding often outweighs the benefit of autocomplete.

4. When AI Closes the Gap: Effective Source of External Knowledge

In contrast to the slowdown observed on tasks where developers already possess deep, tacit knowledge, the study also shows where it shines: bridging external, well-defined knowledge gaps. These are situations where developers lack specific knowledge, but the missing information is easy for the AI to find, explain, or generate — especially when the problem is well-scoped and the question is clear.

GenAI does deliver tangible benefits when the missing knowledge is external, explicit, and prompt-able. Below are the key situations in which AI demonstrably helped participants — and why these align with the Knowledge-Centric Perspective:

Situation where AI added value	Why the gap was bridgeable	Supporting evidence
First–time use of unfamiliar tooling (e.g., Git hooks)	All required knowledge was public and could be captured in a succinct prompt; no hidden project constraints.	A developer estimated that without AI, implementing Git hooks “would’ve taken me [3 additional hours]”
*Exploring unknown parts of one’s own* codebase**	Although the repository was large, the knowledge was explicit (a helper function existed); AI’s search abilities surfaced it instantly.	“Cursor found a helper test function that I didn’t even know existed when I asked it how we tested deprecations.”
Learning domain-specific testing patterns (e.g., frontend test scaffolding)	Patterns could be described formally; AI supplied boiler-plate code the developer could verify quickly.	Cursor was “super helpful in figuring out how to write a [frontend test] … it came up with this solution.”
Answering generic technical questions (e.g., dataset quirks or error messages)	The knowledge lives in public docs / internet text; AI compresses search + synthesis time.	AI answered “general questions about … EICAR”—an unfamiliar dataset—saving the developer lookup time.
Clarifying system boundaries in unfamiliar modules	Boundary rules are documented in code comments and interfaces; AI can parse and summarise them.	Participants reported AI helped “identify where one module’s responsibility ends and another begins.”

Why these gaps are AI-friendly

Externally codified – The required information exists in documentation, public APIs, or visible code—so the model can retrieve or infer it.
Low tacit overhead – Little hidden context is needed; success does not depend on knowledge internal to another developer, unwritten conventions or legacy constraints.
Prompt-able – Developers can formulate a clear question (“How do I register a pre-commit Git hook?”); the answer space is well-bounded.
Easy human validation – The developer can quickly check AI output (a code snippet or command) against docs or a compiler, minimising risk.

By matching the type of knowledge gap to the strengths of GenAI, teams can capture speed-ups where they are real—and sidestep friction where the tool is blind.

Where GenAI Proved Helpful

Unfamiliar tools and workflows
One developer, trying Git hooks for the first time, estimated that AI saved them three hours. The AI offered working examples and helped them learn a tool they had never used before.
Hidden but useful code
In a large codebase, even experienced contributors can miss helpful internal utilities. In one case, a developer asked the AI how to write a test and was surprised when the AI found a helper function they didn’t know existed.
Clear technical questions
For general tasks like writing boilerplate code, setting up tests, or explaining error messages, GenAI delivered fast, accurate results. These tasks had explicit answers that the model could generate reliably.
AI as a learning aid for domain boundaries
Developers mention that AI was helpful “in identifying where one module’s responsibility ends and another begins” in an unfamiliar system, which suggests AI helps clarify system boundaries—another form of externalized, discoverable knowledge.
AI helps with language-specific questions
For example, one developer describes using the AI to help with a "Rust error that I’ve seen before, but still don’t fully understand.”
— The AI filled in an understanding gap about the programming language’s behavior or error model.

Why These Gaps Are Bridgeable

What makes these situations ideal for GenAI?

The missing knowledge is external — not buried in the developer’s head or past decisions.
It’s discoverable — AI can find it in the codebase or from public documentation.
It’s prompt-able — developers can clearly describe what they need in a few sentences.
It’s verifiable — they can quickly check if the AI’s output works or needs changes.

Strategic Takeaway

GenAI is most effective when the knowledge gap is:
✔ Large (the developer truly doesn’t know)
✔ Explicit (the need is clear and specific)
✔ Verifiable (the output can be tested quickly)

In these cases, GenAI speeds up learning and execution. For engineering managers and CTOs, this means GenAI can be a great fit for onboarding, unfamiliar APIs, and exploratory coding—but not for deep architectural decisions or legacy modules with hidden complexity.

5. Rethinking GenAI as a Conditional Knowledge Amplifier

The study makes one thing clear: GenAI isn’t a one-size-fits-all productivity booster. Instead, think of it as a conditional amplifier: it only turns up your team’s output when it can reliably fill in missing knowledge. If the gap is hidden or tied up in decades of unwritten practices, AI may just add noise.

GenAI adds value only when three conditions line up:

The team is missing real knowledge.
That knowledge lives somewhere the AI can reach (docs, code, the public web).
The gap can be described in a clear prompt and the answer can be tested quickly.

If any of those pieces are missing, AI suggestions risk adding noise, not speed.

To decide when—and how—to use GenAI effectively, evaluate each task against three simple questions:

Question to ask	Why it matters	Quick check
Is this task familiar or new?	Familiar tasks mean the dev already has the answers; AI adds little.	“Have we done something almost identical before?”
Does success depend on tacit or explicit knowledge?	Tacit rules (legacy quirks, tribal conventions) aren’t in the prompt; AI can’t see them.	“Could a new hire solve this with docs alone?”
Can I prompt and verify fast?	Clear, bounded questions + easy tests = low risk. Vague goals = high cleanup cost.	“Can I frame this ask in two lines, and confirm the result with one quick run?”

Task familiarity

Low familiarity: Great fit for AI (e.g., exploring a new API).
High familiarity: AI likely adds friction on well-known code.

Tacit vs. explicit knowledge

Explicit needs: Documentation, error messages, or clear patterns—AI can retrieve or generate these.
Tacit needs: Design rationale, team conventions, or legacy quirks—better left to human experts.

Promptability & verification cost

Easy to prompt: You can ask a clear question in a line or two.
Low cost to verify: You can compile, test, or review the AI’s answer in minutes.

By matching GenAI to tasks where these conditions hold, teams can amplify real knowledge gaps and unlock genuine speed-ups—while avoiding the slowdown traps in context-heavy areas.

Practical Guidelines for Teams

Use AI for onboarding and exploration. New libraries, CLI tools, boiler-plate tests—anything where the needed info is public and verifiable.
Turn AI off (or down) for deep, legacy work. Modules filled with hidden business rules or hacks are better left to the humans who know them.
Write prompts like you write tickets. Clear acceptance criteria, specific inputs and outputs. The sharper the prompt, the better the answer.
Budget review time. Treat AI output like code from a junior dev: useful drafts, not drop-in replacements.
Measure impact task-by-task. Track whether AI actually reduces time-to-merge or bug count. If not, adjust usage.

Bottom line: GenAI boosts productivity only under the right conditions. Treat it like any other tool — deploy where it amplifies knowledge, avoid where it clashes with hard-earned expertise, and you’ll capture real speed-ups without hidden slowdowns.

Conclusion

The METR study didn’t prove that GenAI is useless — it showed where and how it helps. When developers face clear, external gaps in knowledge, AI can speed up onboarding, exploration, and routine coding tasks. But in areas rich with hidden context — legacy modules, unwritten rules, or deep architectural quirks — AI often slows things down.

By thinking in terms of knowledge gaps, teams can unlock real productivity gains:

Use AI for new APIs, boilerplate, and general technical look-ups.
Skip AI on code you know inside out or systems dense with tacit constraints.
Always ask: “Is this a gap AI can bridge effectively?”

With a Knowledge-Centric mindset, GenAI becomes a powerful amplifier exactly where you need it — and stays quiet where human expertise reigns supreme.

When Knowledge is the Bottleneck

Related Articles

Introduction

1. The Knowledge-Centric Perspective

2. GenAI as a Source of Knowledge

3. When GenAI Slows You Down: Tacit Knowledge and Hidden Context

Why Tacit Knowledge Defeats the Model

Mechanism in Knowledge-Centric Terms

Why This Happens

Strategic Implications

Strategic Takeaway

4. When AI Closes the Gap: Effective Source of External Knowledge

Why these gaps are AI-friendly

Where GenAI Proved Helpful

Why These Gaps Are Bridgeable

Strategic Takeaway

5. Rethinking GenAI as a Conditional Knowledge Amplifier

Practical Guidelines for Teams

Conclusion

Dimitar Bakardzhiev