Wall-Clock vs. Effort: Why I Started Quoting Two Estimates

I told someone a small UI prototype would take "5-7 hours." Then I delegated the whole thing to a sub-agent in a git worktree, walked away, and came back 30 minutes later to a finished diff. The person on the other side of that conversation had no idea what they were committing to. They thought they were getting an answer that night at the earliest. They could have had it by lunch.

The number wasn't wrong. The diff really was about five hours of focused human work. It was just a useless answer because I never said how I was going to produce it.

Since then I've stopped quoting one number. I quote two.

Wall-Clock and Effort Are Different Things

For any task with effort over thirty minutes or so, I now give:

Wall-clock estimate. How long before there's something to look at. This is what someone actually waits for. It drives decisions like "before the meeting?" and "tonight or tomorrow?"
Effort estimate. The size and complexity of the change itself. This is what drives review burden and "is the scope right for the bandwidth I have?"

These used to be the same number. Working in agents, they're not. A 5-7 hour effort can be a 30-45 minute wall-clock when it's delegated to a worktree and I'm doing something else while it runs. A 1-hour effort can be a 4-hour wall-clock if I'm context-switching between three other things.

The format I use now sounds like this:

"Wall-clock about 30-45 min via agent delegation. Effort-equivalent of 5-7 hours of focused human work, so the diff is going to be big and the review will be substantive."

Both numbers. The delegation strategy explicit. A hint at what the review looks like so the other person knows what arrives in their inbox.

Always Name Whether You're Anchoring on Human-Time or Agent-Time

The miss in the prototype example wasn't that I quoted the wrong number. It was that I quoted a human-developer number and then silently delegated. If I'd said "5-7 hours if I do it inline, 30-45 minutes if I delegate to an agent in a worktree (recommended)," the person could have made an actual decision.

This matters because the choice between inline and agent-delegated isn't just speed. Inline work is incremental. I see issues mid-build, ask questions, course-correct. Agent-delegated work shows up complete: bigger upfront review, fewer course corrections, harder to redirect mid-stream. Different commitments, not just different timelines.

Quoting one number hides the choice. Quoting two surfaces it.

My Calibration Biases (Yours Will Differ)

After tracking misses for a while I can name where I'm wrong consistently. The patterns are durable enough that I now adjust at quote time:

Design, brand, and prototype work: I overestimate 3-10× when delegating. This is the category that bit me in the example. Visual work moves faster through agents than through me, partly because I treat design as "thinking time" when really the agent just iterates faster.
Debug, investigation, and integration: I underestimate 2-3×. Root causes surprise me. The canonical case: a row-shift bug I scoped at one hour turned out to be a classifier-bounds issue four levels upstream. Six hours later I was still pulling the thread.
Building features end-to-end: usually accurate within 2× either way. This is the category where my gut is calibrated.
Writing docs, specs, and runbooks: usually accurate. The shape of the work is predictable.

Yours will be different. The point isn't my biases, it's that knowing them lets you adjust at quote time instead of after the fact. If you're not tracking your misses, your next estimate is going to land in the same place as your last one.

The Pushback Test

When someone says "that feels off" about an estimate, the wrong move is to defend the number. The right move is to re-derive it from the calibration log: which category is this work in, what's the bias on that category, and is there a project-specific reason it should be different.

I keep the calibration log in project memory. Each project gets its own, because the biases shift with the work. A design-heavy project's miss patterns aren't the same as an analysis-heavy project's. The rule (two estimates, flag delegation) is universal. The data isn't.

What This Doesn't Solve

This rule fixes a specific failure: the gap between what I commit to and what I actually deliver, when agents are in the loop. It doesn't fix scope creep. It doesn't fix unknown unknowns. It doesn't make my estimates more accurate in the average sense, just more honest about which kind of estimate I'm giving.

That alone is worth the friction. The next time I get asked "how long?" I won't pick the wrong number. I'll give both, name the strategy, and let the person on the other side decide what they're committing to.