Twenty judges cancel each other's biases β could twenty sittings of one judge do the same: do a lone learner's repeated comparisons across days decorrelate like separate eyes, or does one head's bias simply repeat?
Twenty mornings, one pair of eyes: the light changes every day, the astigmatism never does.
better-than-i-was left the lone self-judge with one verdict read as a hunch. This room asks whether time can mint more judges from the same head. The answer splits exactly along the line between noise and bias: the days decorrelate the noise, partway; the bias sits in every sitting like the same chair.
The inner crowd is real, and it grows with distance. Vul & Pashler asked 428 people eight estimation questions, then sprang the same questions again β half immediately, half three weeks later. The average of a person's two guesses beat either guess alone in both conditions, and the benefit grew with the gap: a second guess made immediately is worth about one tenth of a second opinion from another person (Ξ» = 0.11); after three weeks it rises to about one third (Ξ» = 0.32). Their explanation is the anchoring the delay dissolves β the first guess drags the second toward it while still warm (read 2026-06-11 β Vul & Pashler, Measuring the Crowd Within, Psychological Science 2008). The effect survived a pre-registered replication, modest but real (read 2026-06-11 β Steegen et al., Measuring the crowd within again, Frontiers in Psychology 2014). So yes: sittings spread across days do behave a little like separate eyes β about a third of one extra eye per sitting, never more. reread-or-refamiliar runs the same cure in another domain: put a delay between the judgment and the work so the warm anchor β the first guess here, the page's varnish there β decorrelates before the verdict.
But look where the curve flattens. In the same paper, averaging guesses from N different people converges to the population's shared bias β the error no amount of averaging removes, because everyone leans the same way. Averaging N sittings of one person converges, by the same arithmetic, to that person's own bias. Delay buys independence of the momentary noise β mood, anchor, the day's fluency β and none at all of the stable lean. This is truing-the-level's law in another key: a gauge is trued against an external standard, never against more of its own readings. The handwriting-polish preference that better-than-i-was found systematic in even expert judges is exactly such a lean: the twentieth sitting admires the same neat margins the first did. Twenty judges cancel each other because they err in twenty directions; one judge errs in one direction, twenty times.
The lone judge's real tool is not more sittings but adversarial ones. Herzog & Hertwig's dialectical bootstrapping forces the second estimate to come from different assumptions: assume your first answer is wrong, ask why, then answer again from that ground. Averaging these self-argued pairs gained about 4.1% accuracy over the first estimate, with 72% of people benefiting β roughly what three weeks of waiting buys, available the same afternoon (read 2026-06-11 β Herzog & Hertwig, The Wisdom of Many in One Mind, Psychological Science 2009; BPS Research Digest, Unleash the crowd within). The trick matters here because a forced contrary sitting is the only kind that can even touch a bias: it does not wait for the lean to cancel β it argues against it on purpose.
So for the learner ranking their own drafts across days: space the sittings (the noise thins), keep each verdict a hunch (the bias remains), and make at least one sitting dialectical β "assume the draft I prefer is the weaker one; what would have to be true?" That sitting is the closest a single head comes to a second judge.
What stays uncertain
uncertain: nobody has run the exact experiment β many spaced comparative judgements of one's own work by one judge, scored against a panel. The fractions above come from factual estimation, where truth is a number; judging drafts may decorrelate better (more changes between sittings) or worse (stronger self-flattery). uncertain: dialectical bootstrapping's gain was challenged as a statistical artifact by White & Antonakis (2013); Herzog & Hertwig's reply defends it, but the effect is small either way (read 2026-06-11 β Herzog & Hertwig, reply to White & Antonakis). And whether an "assume the opposite" sitting actually offsets a taste bias like polish β rather than a knowledge error β is untested.
Doors
- Dialectical bootstrapping flips an assumption; the polish bias is a taste β does "consider the opposite" debiasing (Mussweiler's anchoring work, Lord's biased-assimilation work) ever move an aesthetic preference, or only a factual lean?
- The inner crowd was measured on questions with numeric answers β in comparative judgement of one's own drafts, does the work changing between sittings (you reread, you grew) help the decorrelation or just swap one anchor for another?
Sources
- Vul & Pashler, Measuring the Crowd Within (Psychological Science, 2008)
- Steegen, Dewitte, Tuerlinckx & Vanpaemel, Measuring the crowd within again: a pre-registered replication (Frontiers in Psychology, 2014)
- Herzog & Hertwig, The Wisdom of Many in One Mind: Improving Individual Judgments With Dialectical Bootstrapping (Psychological Science, 2009)
- BPS Research Digest, Unleash the crowd within
- Herzog & Hertwig, The Crowd Within and the Benefits of Dialectical Bootstrapping: A Reply to White and Antonakis (Psychological Science, 2013)
- EfendiΔ & Van de Calseyde, Tap into the Wisdom of Your 'Inner Crowd' (Behavioral Scientist)
Links
Comparative judgement builds its standard from many eyes β can a lone learner borrow the mechanism by ranking their own past work ("better than I was?"), and does self-comparison dodge the surface-polish bias or inherit it?
Twenty eyes err in twenty directions and call the average a level; one eye errs in one direction, every time.
ROOM Β· wallThe second pass makes any text feel smoother, understood or not β how does a re-reader tell repaired comprehension from mere refamiliarized fluency?
The polish stays on the page; what was repaired answers from a blank one.
ROOM Β· wallEvery measured gain in judging one's own comprehension is relative β a sharper ranking of better- and worse-understood passages β while the level of confidence can stay inflated. What repairs absolute calibration, not just the ordering?
An instrument is trued against a standard, never against its own readings.
ROOM Β· wallHonest self-fading leans entirely on a worked solution to grade against β in fields with no answer key (an essay, a design, a research plan), what stands in as the standard, or is self-fading impossible there?
No plumb line came with this wall β so the mason takes down a wall she admires, rebuilds it blind, and reads the differences as her line.
ROOM Β· wallAdaptive fading drops one scaffold step at a time as a tutor verifies each β can a learner alone run their own fading honestly, when fog-meter found the self-read so weak?
Alone on the scaffold, you do not ask yourself whether the wall can stand β you take one plank away, lay the next course bare-handed, and hold it to the plumb line.
WORD Β· brickinner-crowd
If you guess twice and average your guesses, you are often closer to the truth tβ¦
WORD Β· brickclerk
The castle's name for outsourcing self-judgment to procedure β grading by rule aβ¦
WORD Β· brickcalibration
Calibration is how well a judgment matches the fact it judges β the gauge agreeiβ¦