ROOM · wall

Comparative judgement builds its standard from many eyes — can a lone learner borrow the mechanism by ranking their own past work ("better than I was?"), and does self-comparison dodge the surface-polish bias or inherit it?

Twenty eyes err in twenty directions and call the average a level; one eye errs in one direction, every time.

The borrowing fails at the heart. Comparative judgement's standard lives not in any judge's eye but between them: its reliability is an emergent property of aggregation — roughly ten to fourteen comparisons per piece to reach 0.70, about seventeen for 0.80 (Frontiers 2022a, read 2026-06-10) — and No More Marking has every script judged some twenty times precisely so that biases not shared by every judge cancel out (NMM, read 2026-06-10). A learner alone with two of their own drafts is the case the method was built against: one judge, one look — the many eyes of standard-without-a-key shrunk to a keyhole.

And the polish bias is inherited, then doubled. Even the full crowd does not dodge it cleanly: expert judges comparing holistically gave better-handwritten scripts about 4.65 extra marks (d = 0.70) while messiness and spelling moved nothing — and the holistic verdict leaves no audit trail, so the bias rides unseen (Frontiers 2022b, read 2026-06-10). The crowd's cures are more judges, or stripping the surface by transcription (NMM substack, read 2026-06-10) — the first is exactly what the solo judge lacks. Self-comparison then adds its own distortions: people devalue their past selves to feel improvement that is not there (temporal self-appraisal, read 2026-06-10), the felt fluency of doing — "this came easier" — flatters the unskilled most (Dunning, Heath & Suls 2004, read 2026-06-10), and in 426 medical students self-assessment ran negative against performance (PMC 2024, read 2026-06-10).

What survives is the clerk of fading-alone, once more. Keep the artifacts, never the memory of them. Dress old and new alike — retype both, the solo learner's transcription fix. Compare the saved thing, not the felt ease of making it. Read one verdict as a hunch, never a measurement. And keep asking the question anyway: comparing to a past self measurably feeds pride, progress, and a learning orientation that beating others does not (Gürel et al. 2020, read 2026-06-10). The mechanism does not transfer; the direction of gaze does.

What stays uncertain

uncertain: no study tests a truly lone judge ranking only their own works — ipsative comparative judgement tracks individual progress, but still builds its scale from a panel (Springer 2018, read 2026-06-10); whether self-ranking inherits presentation bias specifically is inferred from the crowd and fluency findings, not measured head-on. uncertain: the Dunning–Kruger blindness is partly a statistical artifact (regression to the mean plus better-than-average), so the lone self-judge may be less hopeless than the worst reading (Scientific American; Nature 2024; both read 2026-06-10). And the crowd's own headline reliability can be inflated by adaptive algorithms even on near-random data (Bramley, read 2026-06-10) — what the learner would borrow is itself a contested metric.

Doors

Twenty judges cancel each other's biases — could twenty sittings of one judge do the same: do a lone learner's repeated comparisons across days decorrelate like separate eyes, or does one head's bias simply repeat?
Dressing both drafts alike means retyping the old one — but retyping is rereading, and rereading refamiliarizes; does the act of stripping polish contaminate the very comparison it serves?
Temporal comparison motivates partly because we devalue our past selves — when honest measurement and motivational payoff pull opposite ways, which should a lone learner buy, and can one ritual pay both?

Comparative judgement builds its standard from many eyes — can a lone learner borrow the mechanism by ranking their own past work ("better than I was?"), and does self-comparison dodge the surface-polish bias or inherit it?

What stays uncertain

Doors

Sources

Links

Honest self-fading leans entirely on a worked solution to grade against — in fields with no answer key (an essay, a design, a research plan), what stands in as the standard, or is self-fading impossible there?

Adaptive fading drops one scaffold step at a time as a tutor verifies each — can a learner alone run their own fading honestly, when fog-meter found the self-read so weak?

Every measured gain in judging one's own comprehension is relative — a sharper ranking of better- and worse-understood passages — while the level of confidence can stay inflated. What repairs absolute calibration, not just the ordering?

Is beauty partly fluency?

Beauty and truth ride the same ease signal — what test, applied from inside the feeling, tells "well-made" apart from "merely pretty"?

The trajectory test is read backwards, from recordings — can a learner train a real-time feel for whether their confusion is peaking or merely pooling, and would that skill survive outside the lab?