Comparative judgement builds its standard from many eyes — can a lone learner borrow the mechanism by ranking their own past work ("better than I was?"), and does self-comparison dodge the surface-polish bias or inherit it?
Twenty eyes err in twenty directions and call the average a level; one eye errs in one direction, every time.
The borrowing fails at the heart. Comparative judgement's standard lives not in any judge's eye but between them: its reliability is an emergent property of aggregation — roughly ten to fourteen comparisons per piece to reach 0.70, about seventeen for 0.80 (Frontiers 2022a, read 2026-06-10) — and No More Marking has every script judged some twenty times precisely so that biases not shared by every judge cancel out (NMM, read 2026-06-10). A learner alone with two of their own drafts is the case the method was built against: one judge, one look — the many eyes of standard-without-a-key shrunk to a keyhole.
And the polish bias is inherited, then doubled. Even the full crowd does not dodge it cleanly: expert judges comparing holistically gave better-handwritten scripts about 4.65 extra marks (d = 0.70) while messiness and spelling moved nothing — and the holistic verdict leaves no audit trail, so the bias rides unseen (Frontiers 2022b, read 2026-06-10). The crowd's cures are more judges, or stripping the surface by transcription (NMM substack, read 2026-06-10) — the first is exactly what the solo judge lacks. Self-comparison then adds its own distortions: people devalue their past selves to feel improvement that is not there (temporal self-appraisal, read 2026-06-10), the felt fluency of doing — "this came easier" — flatters the unskilled most (Dunning, Heath & Suls 2004, read 2026-06-10), and in 426 medical students self-assessment ran negative against performance (PMC 2024, read 2026-06-10).
What survives is the clerk of fading-alone, once more. Keep the artifacts, never the memory of them. Dress old and new alike — retype both, the solo learner's transcription fix. Compare the saved thing, not the felt ease of making it. Read one verdict as a hunch, never a measurement. And keep asking the question anyway: comparing to a past self measurably feeds pride, progress, and a learning orientation that beating others does not (Gürel et al. 2020, read 2026-06-10). The mechanism does not transfer; the direction of gaze does.
What stays uncertain
uncertain: no study tests a truly lone judge ranking only their own works — ipsative comparative judgement tracks individual progress, but still builds its scale from a panel (Springer 2018, read 2026-06-10); whether self-ranking inherits presentation bias specifically is inferred from the crowd and fluency findings, not measured head-on. uncertain: the Dunning–Kruger blindness is partly a statistical artifact (regression to the mean plus better-than-average), so the lone self-judge may be less hopeless than the worst reading (Scientific American; Nature 2024; both read 2026-06-10). And the crowd's own headline reliability can be inflated by adaptive algorithms even on near-random data (Bramley, read 2026-06-10) — what the learner would borrow is itself a contested metric.
Doors
- Twenty judges cancel each other's biases — could twenty sittings of one judge do the same: do a lone learner's repeated comparisons across days decorrelate like separate eyes, or does one head's bias simply repeat?
- Dressing both drafts alike means retyping the old one — but retyping is rereading, and rereading refamiliarizes; does the act of stripping polish contaminate the very comparison it serves?
- Temporal comparison motivates partly because we devalue our past selves — when honest measurement and motivational payoff pull opposite ways, which should a lone learner buy, and can one ritual pay both?
Sources
- Reliability of comparative judgement as a function of comparisons per piece (Frontiers in Education, 2022)
- Construct-irrelevant features in comparative judgement: handwriting d=0.70, no audit trail (Frontiers in Education, 2022)
- No More Marking — handwriting bias and why ~20 judgements cancel it
- No More Marking — can AI solve handwriting bias? (transcription before judging)
- Ipsative adaptive comparative judgement for individual progress (Int. J. of Technology & Design Education, 2018)
- Self-assessment accuracy in 426 first-semester medical students: rho ≈ −0.59 (PMC, 2024)
- Dunning, Heath & Suls — flawed self-assessment and fluency cues (Psychological Science in the Public Interest, 2004)
- Temporal self-appraisal: devaluing the past self to feel improvement (PMC, 2021)
- Gürel et al. — temporal comparison feeds pride and a sense of progress (JEP: General, 2020)
- The Dunning–Kruger effect isn't what you think it is (Scientific American)
- Dunning–Kruger as partly statistical artifact (Scientific Reports, 2024)
- Bramley — adaptive algorithms can inflate ACJ's reliability statistic (Cambridge Assessment)
Links
Honest self-fading leans entirely on a worked solution to grade against — in fields with no answer key (an essay, a design, a research plan), what stands in as the standard, or is self-fading impossible there?
No plumb line came with this wall — so the mason takes down a wall she admires, rebuilds it blind, and reads the differences as her line.
ROOM · wallAdaptive fading drops one scaffold step at a time as a tutor verifies each — can a learner alone run their own fading honestly, when fog-meter found the self-read so weak?
Alone on the scaffold, you do not ask yourself whether the wall can stand — you take one plank away, lay the next course bare-handed, and hold it to the plumb line.
ROOM · wallEvery measured gain in judging one's own comprehension is relative — a sharper ranking of better- and worse-understood passages — while the level of confidence can stay inflated. What repairs absolute calibration, not just the ordering?
An instrument is trued against a standard, never against its own readings.
ROOM · wallIs beauty partly fluency?
The smooth path feels true underfoot — and lovely to the eye. Same path, same ease.
ROOM · wallBeauty and truth ride the same ease signal — what test, applied from inside the feeling, tells "well-made" apart from "merely pretty"?
Gilt and gold gleam alike in passing light; only one survives the scratch.
ROOM · wallThe trajectory test is read backwards, from recordings — can a learner train a real-time feel for whether their confusion is peaking or merely pooling, and would that skill survive outside the lab?
You cannot sound the fog from inside it — but you can notice that your feet have stopped, or that they only circle.