
Building a Bike vs. Running a Production Line
Think about how you'd build a single bicycle in your garage. You grab a wrench, bolt the wheels onto the frame, eyeball the alignment, and tighten things down. Something doesn't quite fit? You loosen a bolt, nudge the frame, and try again.
The tolerances are loose because you are the integrator. Your hands and eyes close every gap. If the seat tube is a millimeter off, you adapt the next step to match.
That's what writing software by hand looks like. You hold the whole thing in your head. You write a function, realize it doesn't quite plug into the caller, and you adjust both sides until they meet in the middle. Mistakes get corrected as you go. The codebase is forgiving because a single mind is reconciling the inconsistencies in real time.
Now picture a Toyota plant. Or, going back further, a Ford Model T line in 1913. The frame arrives at station two, and station two does not get to renegotiate what the frame looks like. If the bolt holes aren't where the spec says they are, the line stops.
Everything moves fast, orders of magnitude faster than the garage. But only because every handoff between stations is rigidly defined. There is no "we'll just shave a bit off this side to make it fit." The output of station one is the input contract for station two, and that contract is non-negotiable.
AI-assisted development is the production line.
Speed Exposes Sloppy Boundaries
When an AI is writing code at the pace it now does, you are no longer the careful integrator nudging things into alignment. You are the line manager. The AI is producing parts. Other parts of the system are consuming them: sometimes other AI sessions, sometimes future-you.
If the seam between two modules is fuzzy, the AI will fudge it the same way you would have in the garage. Shave a bit off here, add a small adapter there, special-case a caller to deal with an inconsistent return shape. Each individual fudge looks reasonable. In aggregate, the codebase rots.
And it rots faster than you'd expect, because AI-generated changes are systematically larger and sloppier than what a careful human would write. A human asked to add a single field tends to add a single field. An LLM asked the same question often writes eighty lines, touches four files, and reorganizes a helper "while it's there."
Some of those extra lines are exactly right. Some are subtly wrong. Some are confidently fabricated: references to functions that don't quite exist, behaviors that were never specified, conventions that don't apply to this codebase. Without sharp seams, you have no efficient way to tell which is which short of reading every line. And reading every line at AI throughput is exactly the bottleneck the tool was supposed to remove.
This is the part that surprises people. The instinct is to think AI makes modularity less important because the AI can "just figure out" how the pieces fit. The opposite is true.
The AI's context window is finite. It pulls in a handful of files, makes a local decision, and moves on. If the boundaries between modules are blurry, the AI has no way to know whether a change belongs here or there. It guesses. The guesses accumulate. The lines get blurrier with every commit, and the next AI session inherits a worse map of the territory than the last one had.
The Author Has Changed
The thing writing your code is no longer a careful human. It is a stochastic function with a finite working memory. Run it twice on the same prompt and you get two different answers. When it does not have the context it needs, it does not stop. It fills in, confidently, in a way that looks indistinguishable from the rest of the code.
This breaks implicit conventions. "We always validate at the controller" works when a human reads the code first. With an LLM, that convention becomes probabilistic: followed when the model happens to see the right examples, ignored when it does not. Either write the convention down and enforce it mechanically (a type, a lint rule, a contract test, a schema), or stop relying on it.
Modularity Is Old. The Stakes Are New.
Modularity isn't a new idea. Every senior engineer has been preaching it for decades. What has changed is the cost of not having it.
In the manual era, weak boundaries cost you slow onboarding and the occasional bug. A human reading the code could still hold enough of it in their head to make a coherent change.
In the AI era, weak boundaries cost you correctness, and they cost it faster than the old failure modes did. The AI cannot hold the whole repo in its head. Not really, not even with a million-token context window. So it relies on the boundaries you have drawn to know where its responsibility ends. If you have not drawn them, the AI will draw them implicitly, badly, and differently every time.
When you do draw them well, explicitly, with types and tests and schemas, you get three compounding dividends. They are the reason the investment in engineering discipline pays back, and pays back more than it used to.
Not every seam deserves the same weight. Rigor scales with the boundary: heavier where modules cross sessions, teams, or services; lighter inside a single module that one person (or one context window) can still hold whole. The point isn't to gold-plate every line. It is to be deliberate about which seams are load-bearing and to enforce those mechanically.
1. Changes become directional
When the boundary is explicit, the AI knows exactly what it is allowed to touch. A ticket like "add a new field to the response of getOrder" becomes a small, well-scoped change: update the schema, populate the field, update the contract test, done.
Without that boundary, the same ticket can become a sprawling refactor. The AI dives into the service layer, then the persistence layer, then the migrations, "improving things along the way" because nothing in its context tells it where the assignment ends. Explicit boundaries are how you give the model a stop sign. The narrower the scope, the less room there is for invention.
2. Tests get sharp
An explicit boundary is something you can write a test for. When the test passes, you have a strong, mechanical signal that the change is safe at the seam, regardless of what happened inside.
When all you have is "the application still mostly works" as your test, you have delegated correctness to the human reviewer. That is exactly what you cannot afford when changes are landing at AI speed. The contract test is the QA station between two assembly steps. It catches the bad part before it travels three modules downstream and becomes nearly impossible to trace back to its origin.
3. Cognitive load drops, for the AI as much as for you
A small, well-bounded module is easier for a human to hold in their head. It is also dramatically easier for an LLM to load into its context and reason about. This is the part most teams miss. You are not just optimizing your own working memory; you are optimizing the working memory of the thing writing your code.
A 200-line file with one responsibility leaves almost the entire context budget free for actually solving the problem. A 2,000-line file with five overlapping concerns leaves the model juggling, and a juggling model is the one that fabricates.
The same logic applies recursively to the seams themselves. A good boundary rarely changes. Once it stabilizes, it becomes part of the substrate that everything else can be reasoned against: a fixed point in a fast-moving codebase. Most of the churn happens inside modules; the seams stay still.
That stillness at the boundary is what makes the system tractable for humans and machines alike over time. Bad design degrades faster under AI than it did under humans, but good design also accelerates more. The curve is just steeper in both directions.
Stillness is not stagnation. The boundaries you drew last quarter will not match what the system needs next quarter, and pretending otherwise is how good design ossifies into fossil. Discipline is not a one-time cut; it is the willingness to refactor the seams when the domain has moved, with a test suite that makes the refactor cheap. Skip that loop and every well-drawn seam eventually becomes a lie the AI is doing its best to honor.
Tests Are the Ratchet
None of the above works without one more piece, which is the part it is most tempting to skip. AI gives you raw forward velocity. Velocity is not the same as progress. Velocity without a safety net is drift. Fast in some direction, including, occasionally, backwards.
An AI-assisted codebase can ship a working feature in twenty minutes and re-break a feature you fixed last month in twenty seconds. If nothing tells you the second thing happened, the average direction of motion is approximately zero. You are not going forward. You are agitating in place at high speed and patting yourself on the back for the throughput.
A good test suite is a ratchet. It converts progress into a one-way thing. When you make a change and the suite stays green, you have the right to keep the change. When it goes red, you revert.
This is not about ceremony. It is about asymmetry. Without the ratchet, every step forward is also a step that might quietly undo three previous steps, and you will not find out until much later, when the cause is buried under hundreds of subsequent commits.
In the manual era you could get away with a thin test suite, because the human writing the code was, themselves, a partial regression detector. They remembered what the code was supposed to do. They noticed when their change felt off and slowed down.
That memory does not exist on the other side of an LLM call. The model has no episodic memory of the prior fix; it has the code in front of it and a probabilistic guess about what the code is for. If the test suite did not capture the prior decision, the prior decision will be lost. Sometimes immediately, sometimes a few sessions later, but eventually and silently.
So the safety net has to be deterministic, and it has to be extensive. Deterministic, because the author is not. You cannot rely on a stochastic process to police a stochastic process. Extensive, because every gap in the net is a place where AI-speed regressions accumulate without being noticed.
Types, schema validation, contract tests at every module boundary, integration tests that exercise real flows. The layers are not redundant. Each one catches a different kind of drift. The cost of building the net feels high until you weigh it against the alternative: a codebase that erodes faster than humans can read it.
The New Job
The valuable work is shifting. Less of it is typing the code that goes inside a module. More of it is deciding where the modules are, what flows between them, how those flows are verified, and what the safety net is required to catch.
You are designing the production line, choosing where the QA stations go, and writing down the rules that the new authors (the ones that are stochastic and forgetful and confident) cannot violate without setting off an alarm.
If you do that job well, AI-assisted development is faster than anything that came before it. The design pays off more than ever, because the velocity it unlocks is genuinely high. If you do it poorly, you get a garage full of half-aligned parts moving at production speed. That's worse than either extreme alone, because the rate of damage outruns the rate of attention.
The bike still gets built. The question is whether you ship one a year or one a minute, and whether anyone wants to ride the result.