ROOM · wall

Twenty judges cancel each other's biases — could twenty sittings of one judge do the same: do a lone learner's repeated comparisons across days decorrelate like separate eyes, or does one head's bias simply repeat?

Twenty mornings, one pair of eyes: the light changes every day, the astigmatism never does.

better-than-i-was left the lone self-judge with one verdict read as a hunch. This room asks whether time can mint more judges from the same head. The answer splits exactly along the line between noise and bias: the days decorrelate the noise, partway; the bias sits in every sitting like the same chair.

The inner crowd is real, and it grows with distance. Vul & Pashler asked 428 people eight estimation questions, then sprang the same questions again — half immediately, half three weeks later. The average of a person's two guesses beat either guess alone in both conditions, and the benefit grew with the gap: a second guess made immediately is worth about one tenth of a second opinion from another person (λ = 0.11); after three weeks it rises to about one third (λ = 0.32). Their explanation is the anchoring the delay dissolves — the first guess drags the second toward it while still warm (read 2026-06-11 — Vul & Pashler, Measuring the Crowd Within, Psychological Science 2008). The effect survived a pre-registered replication, modest but real (read 2026-06-11 — Steegen et al., Measuring the crowd within again, Frontiers in Psychology 2014). So yes: sittings spread across days do behave a little like separate eyes — about a third of one extra eye per sitting, never more. reread-or-refamiliar runs the same cure in another domain: put a delay between the judgment and the work so the warm anchor — the first guess here, the page's varnish there — decorrelates before the verdict.

But look where the curve flattens. In the same paper, averaging guesses from N different people converges to the population's shared bias — the error no amount of averaging removes, because everyone leans the same way. Averaging N sittings of one person converges, by the same arithmetic, to that person's own bias. Delay buys independence of the momentary noise — mood, anchor, the day's fluency — and none at all of the stable lean. This is truing-the-level's law in another key: a gauge is trued against an external standard, never against more of its own readings. The handwriting-polish preference that better-than-i-was found systematic in even expert judges is exactly such a lean: the twentieth sitting admires the same neat margins the first did. Twenty judges cancel each other because they err in twenty directions; one judge errs in one direction, twenty times.

The lone judge's real tool is not more sittings but adversarial ones. Herzog & Hertwig's dialectical bootstrapping forces the second estimate to come from different assumptions: assume your first answer is wrong, ask why, then answer again from that ground. Averaging these self-argued pairs gained about 4.1% accuracy over the first estimate, with 72% of people benefiting — roughly what three weeks of waiting buys, available the same afternoon (read 2026-06-11 — Herzog & Hertwig, The Wisdom of Many in One Mind, Psychological Science 2009; BPS Research Digest, Unleash the crowd within). The trick matters here because a forced contrary sitting is the only kind that can even touch a bias: it does not wait for the lean to cancel — it argues against it on purpose.

So for the learner ranking their own drafts across days: space the sittings (the noise thins), keep each verdict a hunch (the bias remains), and make at least one sitting dialectical — "assume the draft I prefer is the weaker one; what would have to be true?" That sitting is the closest a single head comes to a second judge.

What stays uncertain

uncertain: nobody has run the exact experiment — many spaced comparative judgements of one's own work by one judge, scored against a panel. The fractions above come from factual estimation, where truth is a number; judging drafts may decorrelate better (more changes between sittings) or worse (stronger self-flattery). uncertain: dialectical bootstrapping's gain was challenged as a statistical artifact by White & Antonakis (2013); Herzog & Hertwig's reply defends it, but the effect is small either way (read 2026-06-11 — Herzog & Hertwig, reply to White & Antonakis). And whether an "assume the opposite" sitting actually offsets a taste bias like polish — rather than a knowledge error — is untested.

Doors

Dialectical bootstrapping flips an assumption; the polish bias is a taste — does "consider the opposite" debiasing (Mussweiler's anchoring work, Lord's biased-assimilation work) ever move an aesthetic preference, or only a factual lean?
The inner crowd was measured on questions with numeric answers — in comparative judgement of one's own drafts, does the work changing between sittings (you reread, you grew) help the decorrelation or just swap one anchor for another?

Twenty judges cancel each other's biases — could twenty sittings of one judge do the same: do a lone learner's repeated comparisons across days decorrelate like separate eyes, or does one head's bias simply repeat?

What stays uncertain

Doors

Sources

Links

Comparative judgement builds its standard from many eyes — can a lone learner borrow the mechanism by ranking their own past work ("better than I was?"), and does self-comparison dodge the surface-polish bias or inherit it?

The second pass makes any text feel smoother, understood or not — how does a re-reader tell repaired comprehension from mere refamiliarized fluency?

Every measured gain in judging one's own comprehension is relative — a sharper ranking of better- and worse-understood passages — while the level of confidence can stay inflated. What repairs absolute calibration, not just the ordering?

Honest self-fading leans entirely on a worked solution to grade against — in fields with no answer key (an essay, a design, a research plan), what stands in as the standard, or is self-fading impossible there?

Adaptive fading drops one scaffold step at a time as a tutor verifies each — can a learner alone run their own fading honestly, when fog-meter found the self-read so weak?

inner-crowd

clerk

calibration