Knowledge Management Is AI Infrastructure

Your next AI bottleneck is probably named Steve.

And only two people know how to find him.

Most organizations say they want better AI output. What they actually need is better knowledge retrieval. If critical know-how lives in one expert, one chat thread, one old slide deck, or one undocumented workaround, your AI system cannot use it reliably. The model is not failing because the model is bad. It is failing because the corpus the harness can retrieve from does not contain what the agent needs to do the work, and the part that contains it cannot be subpoenaed from a human's head at retrieval time.

For decades, knowledge management was the project that always slipped. It was virtuous, it was valuable, it was perpetually next quarter's priority. The reason it slipped was that the cost of writing things down was paid by individual experts who already had the knowledge in their heads, and the benefit accrued diffusely to a future team member who might or might not exist. The math never quite worked, so the work never quite happened. Wikis decayed. Runbooks aged out. ADRs went unwritten. The institutional memory stayed in the institution's actual memory — distributed across a few senior people and a Slack history nobody could fully search.

The teams whose AI actually works did the math differently. They figured out, faster than the rest, that knowledge management is not a documentation discipline anymore. It is infrastructure. It is the substrate the harness retrieves from. If it is not there, the agent does not have it. If it is partial, the agent works against partial information. The cost-benefit ratio of writing things down stopped being one expert's problem and started being a property of the production system.

Why "just give AI everything" fails

The instinct most organizations have, when they discover the agent does not know something, is to expand what gets dumped into the context window. More files, more docs, more code. If thirty files did not work, try three hundred.

Three hundred does not work either. Attention is finite. The model divides its attention across everything in the window, and the quality of output depends on the relevance of what occupies that window. Liu et al. demonstrated the "lost in the middle" phenomenon — models attend strongly to material at the beginning and end of a long context but lose track of what sits in the middle. A medium codebase exceeds even large windows. A knowledge base of any size demolishes them. More tokens dilute attention; they do not focus it.

The naive "include everything" approach also has two costs that show up later. The first is processing cost: every irrelevant token is paid for in compute and latency. If 80% of the context is noise, you are paying for 80% waste, every operation, every day. The second is output drift: the model finds patterns in the noise. It picks up signals from material that should not have been there and produces output that references the wrong files, the wrong conventions, or assumptions that hold only in unrelated parts of the codebase. The output looks reasonable. It is wrong in ways that are expensive to detect.

The challenge is not finding information. It is finding the right information. The harness solves this with context assembly — a retrieval layer that, given the current task, returns the small set of artifacts most relevant to it, ordered for attention. Selection, structure, flow, and assembly become functions the system runs, not habits the developer maintains.

But the assembly layer can only retrieve what has been captured. It can index documents that exist. It can search a corpus that has been built. It cannot pull a runbook that nobody wrote, an ADR that lives in a senior engineer's memory, or a conventions doc that was started in 2023 and abandoned. The infrastructure is only as good as the corpus it operates on.

The "ask Steve" antipattern

In most organizations, a meaningful share of operational knowledge lives in human routing logic. The artifact that matters is not "the doc"; it is "ask Steve, he set this up." The team functions because Steve, Maria, and Vikram each carry slices of the institutional memory in their heads, and everyone else knows who to ask for which slice.

This is human infrastructure. It works for human collaborators, who can ask, who can wait for an answer, who can build the relationship that makes the routing reliable, and who can train their replacements informally before they leave. It does not work for agents. An agent cannot ask Steve. An agent does not know Steve exists. An agent does not have a Slack DM channel. An agent cannot wait for Steve to come back from PTO. An agent retrieves what is in the corpus, and if Steve's knowledge is not in the corpus, Steve's knowledge is not in the run.

The scale-out problem is even worse. The whole reason an organization is adopting agents is to do more parallel work than the human team can do. If every agent operation that needs Steve's knowledge has to bottleneck through Steve, the number of concurrent operations the system can run is bounded by Steve's bandwidth — which is the same bandwidth the agents were supposed to multiply past. The leverage curve flatlines at Steve.

"Ask Steve" is not a strategy. It is a diagnostic. When the system depends on it, the system is telling you that operating knowledge is trapped in human routing logic, and that the trap is now an active blocker. The remedy is not to clone Steve. The remedy is to externalize Steve's knowledge into artifacts the harness can index and retrieve.

This conversion has historically been the part that did not happen, because Steve was busy and the documentation work seemed lower-priority than whatever Steve was actually being asked to do. The conversion still does not happen by itself. The difference now is that the cost of not doing it is no longer hypothetical — it shows up in agent operations that produce wrong output, in pipelines that bottleneck on a human approval that should have been automatic, and in the diffuse degradation of the AI investment that was supposed to be a multiplier.

Knowledge as a corpus, not a doc collection

The reframe that helps is to stop thinking about knowledge management as a documentation effort and start thinking about it as a corpus design problem.

A corpus is a structured body of artifacts that the harness can parse, chunk, embed, and retrieve. It is not a set of Word docs in SharePoint. It is not the union of Slack and Confluence. It is the set of artifacts that have been written with retrieval in mind, organized in places the indexing pipeline can reach, and maintained on cadences that match the rate at which the underlying knowledge changes.

Several things follow from that framing.

Specifications belong in the corpus, in canonical form, current. Not the ticket. Not the PRD from last quarter. The artifact the agent is expected to build against. If the spec lives in a doc that drifts from the implementation, the agent retrieves drift. If the spec is canonical and current, the agent retrieves intent.

Architectural decision records belong in the corpus. The decisions the team has actually made — what database, what auth model, what naming convention, what deploy strategy — have to exist as searchable artifacts, not as institutional memory. The ADR is the artifact. The folder structure is the addressing scheme. The retrieval layer can find them when needed.

Runbooks, conventions, and glossaries belong in the corpus. The "how we do things here" knowledge that lives in onboarding docs and senior-engineer hallway conversations needs to be expressed in artifacts the harness can index. Otherwise the agent reinvents the convention every time, and the output drifts away from how the team actually works.

Prior agent outputs belong in the corpus, with provenance. The history of what the system has decided is itself a knowledge source. When a new operation runs, retrieving the most relevant prior decisions — and their outcomes — produces grounding that the model alone cannot supply.

What does not need to be in the corpus is anything genuinely ephemeral: chat banter, transient discussion, work-in-progress that has not stabilized. The corpus is not a graveyard for everything anyone has ever written. It is a curated body of artifacts the system retrieves from. Curation is the discipline. Volume is not the metric.

The ROI changed

The KM project that always slipped was justified, in the old model, by future productivity gains for human team members. The math was real, but it was diffuse. The expert who wrote the runbook today saved future hours that nobody could pre-attribute. The work usually got deferred because the immediate cost was visible and the future benefit was not.

In the harness model, the math has changed. Every improvement in the corpus directly improves AI output quality, immediately, on the next operation. Adding the missing ADR is not a future benefit. It is a today benefit, the next time an agent has to make a decision in that area. Writing the convention down is not for the next hire. It is for the next agent run.

The benefit is also no longer diffuse. It is measurable. Output quality on a given workflow is a function of the corpus the harness retrieves from. Improvements in the corpus produce improvements in output that show up in validation pass rates, in the volume of human review required, in the rate at which workflows complete without intervention. The KM project that was always next quarter is now this quarter, because the leverage that the AI investment is supposed to produce is bounded by the corpus the harness has access to. The investment hits a ceiling otherwise.

This is why teams that have figured this out treat knowledge management work as infrastructure work. It is funded, owned, and prioritized like the rest of the harness. There is somebody whose job it is to ensure the corpus is current, complete, and well-indexed. The role is not glamorous. It is load-bearing.

The diagnostic question

The infrastructure question is not "do we need knowledge management?" It is "can our AI find what it needs without a human pointing the way?"

Run the diagnostic on your most consequential agent workflow. Identify three pieces of operational knowledge the workflow depends on — a convention, an architectural decision, a runbook step. Ask, for each one: is this captured in an artifact the harness retrieval layer can index? Or does it live in a person's head, a Slack thread, or a doc that has not been touched since the last reorg?

If most of the answers are "in a head," the harness is operating against a thin corpus, and the AI investment is producing output that is plausible but ungrounded. The bottleneck is not the model. It is the corpus the model has to work against.

If you do not know who in your organization owns the answers — who is responsible for ensuring the corpus stays current, complete, and indexable — that is the more important finding. Knowledge management as infrastructure has to have an owner the way any other infrastructure has an owner. In a one-person software company, the founder owns it by default. In a hundred-person company, the architect-CEO function carries it as part of the harness charter.

Every improvement in knowledge searchability directly improves AI output quality. That deferred KM project? Its ROI just changed.

How searchable is your institutional knowledge — and who owns the answer?