May 28, 2026

Stop shipping confidence you haven't earned

Read a passage on my Kindle this week that hit harder than it had any right to. It was about writing — specifically about people generating reports and articles with AI — but every sentence translated cleanly into the way I've been coding.

Stop treating the generated file as the first thing you make. Make the truth layer first: an inventory of your sources, a map of which claim rests on which source, a log of every assumption, and a verification pass that tries to break the result before anyone else can. Build that, and the model gets far more useful, because now it is working on top of something real instead of guessing inside a costume that looks like work. Skip it, and you are shipping confidence you have not earned.

The generated file. The thing that looks like work. The output you ship.

Substitute “the diff” for “the file” and the whole paragraph becomes a diagnosis of why AI-assisted coding so often produces convincing-looking code that's load-bearing in all the wrong places.

The costume problem

LLMs produce extremely well-tailored costumes. You ask for a function and you get one back in thirty seconds. It uses real-looking names. It imports modules that probably exist. It calls APIs with shapes that sound right. It wraps the whole thing in a tone of total certainty.

If there's no truth layer underneath, none of that is grounded. The names might be plausible but wrong. The imports might be from a different version of the library. The API shape might be from another framework entirely. The certainty is decorative.

This isn't unique to AI. We've been shipping confidently wrong code since the dawn of typing. But AI compresses the cycle. The gap between “I have a vague idea” and “I have a working-looking diff” used to take an hour of careful typing. Now it takes thirty seconds. The truth layer didn't have time to keep up.

What the truth layer actually is

Four pieces, all familiar, rarely stated together.

Inventory of sources. What you actually know. The file you're editing, the call sites that depend on it, the schema, the types, the tests, the prior decisions. Not “I think” — what you've read. This is the ground truth your changes rest on.

A map of which claim rests on which source. For every non-obvious line in your diff, you should be able to point at the thing that justifies it. Why this name? Because it's defined at lib/queries.ts:447. Why this shape? Because the API returns it. Why this branch? Because this test asserts it. If you can't answer, the line is a guess wearing a costume.

A log of assumptions. Things you're treating as true without verifying. “Assuming this function is idempotent.” “Assuming this runs server-side only.” “Assuming this env var is set in production.” Assumptions are not problems by themselves — every piece of code rests on them. The problem is leaving them implicit, so you forget which ones are propping up the diff.

A verification pass that tries to break it. Not “does it compile” or “do the tests pass” — those are floor checks. A real verification pass takes the assumption log and attacks it directly. The function is supposed to be idempotent? Call it twice with the same input. The branch is supposed to handle empty input? Pass it []. The integration is supposed to retry? Kill the network. Try to make the costume fall off.

The trap, specifically

None of these steps feel like progress. Reading existing code doesn't make a diff. Listing assumptions doesn't ship a feature. Trying to break your own work feels like generating negative work.

Meanwhile, generating the file feels productive. The file appears. It has lines. It looks done.

But the file is a byproduct. It's not the work. The work is the truth layer the file rests on. If that's solid, the file is correct because it can't be otherwise. If that's missing, the file is correct only by accident — and accidents don't survive contact with production.

What this looks like in a real session

When I'm at my best — when I remember to do this — a coding session with an LLM looks like:

Spend the first ten minutes reading. Not skimming. Actually reading. The file. The call sites. The types. The tests, if any.
Hold the inventory in mind: this is what's there, this is what it does, this is what calls it.
Frame the change as a delta on top of that inventory. “I need to add X. Given what's there, the cleanest place is Y, because of Z.”
Generate the diff. The LLM does the typing, but the diff is anchored to real things.
Read the diff line by line. For each non-obvious line, ask: what source justifies this? If the answer is “the model said so,” that line is a guess.
List the assumptions explicitly. Two or three are usually enough — most diffs don't rest on many.
Verification pass. For each assumption, find a way to test it. Edge case. Weird input. Concurrent call. Missing config.

Steps 1–3 and 5–7 are the truth layer. Step 4 is the generated file. Notice how lopsided the time allocation should be.

Why this matters more now

Two years ago, when LLM-generated code was visibly wrong about a third of the time, this was less of a trap. The costume was obviously a costume. Now the costume is good. Most of the lines are right. The structure is right. The naming convention matches. The lies are subtle. The diff looks right.

If you only check whether the code looks right, you'll merge code that looks right but doesn't survive contact with reality. The truth layer is what catches that. It's the only thing that catches that.

So stop treating the diff as the work. The diff is the byproduct. The work is making sure something solid lives underneath it — and that the diff has earned the place it's about to take.