There is one person at your company you hope never quits.
Not because he runs the business. Because he is the only one who understands why the system does what it does.
Call him Steve. Steve knows why invoices over a certain amount route through a second approval that appears in no policy document. He knows why the nightly job skips the third Tuesday of the quarter. He knows why one customer's orders bypass the standard workflow entirely, and he knows that the workaround that handles them has been load-bearing since before the last reorg. None of it is written down. It lives in Steve's head, where it has lived for the better part of a decade, and it surfaces only at the moment something breaks and someone asks the question only Steve can answer.
You have wanted to modernize the stack for two years. The platform is old, the talent pool that maintains it is shrinking, and every quarter the risk of running the business on it gets a little harder to defend to the board. So you have brought in vendors. Every vendor who quotes the work assumes the rules are documented somewhere — in a specification, a wiki, a comment in the code, a requirements document filed when the system was built. They are not. The documentation is Steve. The last modernization attempt stalled the moment the new system hit a case nobody remembered to mention, and it stalled quietly, the way these efforts usually do, with the budget reallocated and the old system still running because the old system still works and nobody can prove the new one will.
The legacy is not the technical problem
This is what most modernization conversations get wrong, and they get it wrong in a way that costs years.
They treat the legacy as a technical problem. The language is old — COBOL, or a version of Java nobody wants to touch, or a framework that went out of support before half the current team was hired. The database is old. The architecture is a monolith that should be services, or a tangle of stored procedures that should be application logic, or a batch pipeline that should be event-driven. All of that is true, and all of that is the easy part. Old technology is a known quantity. There are migration paths, reference architectures, and a labor market full of people who have moved systems off exactly this stack. The technical translation of a legacy system into a modern one is a solved problem in the sense that the steps are known even when the work is large.
The hard part is not the technology. The hard part is that your business rules were never externalized.
They were learned, applied, and then forgotten as documentation the instant they worked. Someone, years ago, encountered a real problem — a fraud pattern, a regulatory requirement, a customer who threatened to leave, an edge case that corrupted a report — and they solved it. They wrote the rule into the system, the rule worked, and the moment it worked it stopped being a decision anyone discussed and became simply the way the system behaves. The reasoning evaporated. The behavior remained. Multiply that by a decade of solved problems, and the system is no longer just running your business. It has become the only complete record of how your business actually runs.
That is the asset nobody put on the balance sheet. The code is not the asset; the code is replaceable. The rules encoded in the code are the asset, and they exist in exactly two places: the running system, where they are expressed in a language you are trying to retire, and Steve's head, where they are expressed in a language that retires when Steve does.
You cannot rebuild from a specification that does not exist
Once the problem is framed this way, the reason every clean-rebuild plan has stalled becomes obvious. A clean rebuild was never actually on the table.
A rebuild presupposes a specification. You build the new system against a definition of what it must do — the inputs it receives, the outputs it must produce, the rules it must enforce, the edge cases it must handle. That definition is the thing the rebuild is built against and validated against. Without it, "rebuild" means "reproduce a behavior nobody has fully written down," which is not an engineering task. It is an archaeology task wearing an engineering task's clothes, and the mismatch is why these projects fail in the same place every time: not at the start, when the well-understood happy path gets reimplemented quickly and the demo looks great, but three months in, when the new system meets the third Tuesday of the quarter and does the wrong thing, and the only person who knows it is the wrong thing is Steve, who was not in the room.
This is the same discipline the rest of this series has argued for, viewed from its hardest angle. The operating loop is specify, direct, validate — write the specification that defines what correct means, direct the implementation against it, validate the output against the specification rather than against someone's memory. The loop assumes the specification can be written. In a greenfield build, it can: you are defining new behavior, so you author the definition. In a legacy modernization, the specification already exists — it is just not written in any form you can read. It is distributed across a million lines of code and one employee's recollection, and it has never been assembled into a document a person or a coding agent could direct work against.
So the specification is not the starting input to the modernization. The specification is the first deliverable of the modernization. It has to be produced before any replacement can be specified, directed, or validated, and producing it is the work everyone has been skipping because it does not look like progress and it does not demo.
Excavation before replacement
What the work actually requires is excavation.
Excavation means pulling the rules out of the running system and out of Steve's head and encoding them somewhere durable — versioned, reviewable, testable — before you replace anything. It is the deliberate recovery of the institutional knowledge that the system has been silently accumulating, turned into the specification that the system never had. Replacement comes after, and replacement is comparatively easy once excavation is done, because at that point you are doing the solved technical task — building a new system against a known specification — rather than the unsolved one of reconstructing behavior from artifacts that were never meant to be read as documentation.
Excavation has two faces, and a serious effort works both at once. One face is the system itself: the code paths, the branch conditions, the data transformations, the configuration tables full of magic numbers, the batch jobs and their schedules, the integration points and their undocumented contracts. These are the rules made mechanical, and they can be read directly, traced, and exercised against historical data to recover what they actually do under every condition that has occurred. The other face is Steve: the reasoning behind the rules, the cases that have not occurred yet but that Steve is braced for, the workarounds whose absence would be catastrophic and whose purpose is invisible from the code alone. The system tells you what happens. Steve tells you why, and which of the behaviors are intentional rules versus accumulated accidents that everyone has stopped noticing.
The output of excavation is a specification: the rules, written down, with their reasoning attached, in a form that can be reviewed for correctness, tested against the system's actual historical behavior, and handed to whoever — or whatever — builds the replacement. That artifact is the thing the modernization was always missing. It is also, not incidentally, the thing that makes Steve replaceable in the only sense that matters: his knowledge is no longer a single point of failure, because it now lives in a document the organization owns rather than in a head the organization rents.
The excavation is now something you can do deliberately
Here is the part that has changed, and it is the reason this is worth raising now rather than as a permanent counsel of despair.
Until recently, excavation at this scale was impractical. Reading a million lines of legacy code to recover every rule, tracing every branch, reconstructing the behavior of every batch job, and cross-examining it against years of production data was a labor cost so large that organizations rationally chose to keep paying Steve instead. The excavation never happened because it could not be afforded, and so the institutional-memory bottleneck persisted by default, not by decision.
That calculation has moved. The same engineering capability this series has been describing — coding agents operating inside a harness, directed against specifications, validating their output systematically — is well suited to legacy archaeology. An agent can read the entire codebase, not the sample a human reviewer has time for. It can trace every conditional, enumerate every code path, surface every magic number and unexplained branch, and reconcile the code's behavior against historical inputs and outputs to determine what the rules actually are. It can draft the specification, flag the places where the code's intent is ambiguous, and produce exactly the targeted questions that turn Steve's decade of tacit knowledge into explicit, recorded reasoning — so that Steve's time is spent answering the questions only he can answer, rather than narrating the parts the system can already explain.
The work is still real, and it still requires judgment — the agent recovers what the system does, but only Steve can confirm which behaviors are load-bearing rules and which are fossils safe to retire. What has changed is that excavation has gone from economically irrational to systematically achievable. The knowledge can be recovered, written into specifications, and made to outlive any single employee, on a timeline and at a cost that make doing it the responsible choice rather than the unaffordable one.
Name the real bottleneck
But it starts with naming the real bottleneck, because every plan that misnames it fails the same way.
It was never the budget. Organizations have spent the budget, more than once, on rebuilds that stalled. It was never the tooling. The tools to build a modern system have been available the entire time. The bottleneck is that the most valuable asset in your company is undocumented, concentrated in a running system you are trying to retire and in one person you cannot afford to lose, and one resignation away from walking out the door. Every modernization plan that treats the technology as the problem is solving the easy part and leaving the hard part — the part that actually determines whether the project succeeds — untouched until the moment it fails.
The reframe is the whole move. Stop scoping the work as "replace the old system" and start scoping it as "recover the rules the old system encodes, then replace the system." The first framing produces vendor quotes that assume a specification exists and projects that die when they discover it does not. The second framing produces excavation first, specification as the deliberate output of that excavation, and replacement as the last and safest step — built against a definition the organization owns, validated against behavior the organization has recovered, and no longer hostage to whether Steve stays another five years.
The diagnostic question
What does Steve know that your business cannot afford to lose?
Answer it concretely, by name, this quarter. Identify the person — there is almost always a person — whose departure would leave a behavior in your systems that nobody else can explain. List what is in that head and nowhere else. Then ask whether your modernization plan has a line item for getting it out, written down, and verified, before the replacement work begins. If it does not, the plan is not a modernization plan. It is a bet that Steve does not quit before it ships, and that is not a bet a serious business should be making with its most valuable undocumented asset.
What does Steve know that your business cannot afford to lose?
