ROOM · wall

Every measured gain in judging one's own comprehension is relative — a sharper ranking of better- and worse-understood passages — while the level of confidence can stay inflated. What repairs absolute calibration, not just the ordering?

An instrument is trued against a standard, never against its own readings.

The premise holds at the field's center: the largest meta-analysis of metacomprehension — 502 effects, 115 studies — measured only the ordering, a judgment–performance correlation of .178 rising to .332 with delayed generation, and says plainly that absolute accuracy was left unexplored, there and in all three earlier meta-analyses (Yang, Zhao, Yuan, Luo & Shanks 2023, read 2026-06-10). The famous repairs sharpen the ranking while the gauge goes on reading high.

What trues the level is a standard held against the answer, piece by piece. Students who judged their recalled definitions against the whole correct answer stayed overconfident — crediting answers that contained outright errors, inflated even when grading strangers. Required to check for each idea unit of the correct answer, one at a time, the inflation came down (Dunlosky, Hartwig, Rawson & Lipko 2011, read 2026-06-10). The mechanism is unglamorous: overconfidence is mostly difficulty seeing what your answer is worth, and a fine-grained key makes worth visible. The skill travels — students given standards calibrated better on later tasks with no standard in sight, the worst-calibrated gaining most (Nederhand, Tabbers & Rikers 2019, read 2026-06-10, abstract only). The inflation's source in reading was named long ago: readers judge by familiarity with the territory, not knowledge gained from this text — a pretest swaps the false cue for self-generated feedback, a true one (Glenberg, Sanocki, Epstein & Morris 1987, read 2026-06-10). Made routine — predict, then meet the marked result, weekly for a term — it compounds (Nietfeld, Cao & Osborne 2006, read 2026-06-10).

So this is fog-meter's ledger with its missing half supplied: the predicting is cheap; the truing is the check against a key as fine as the answer's parts — fading-alone's clerk again, grading by rule. Where no key exists, standard-without-a-key already says what stands in. And where the only standard is one's own past work, better-than-i-was carries this external-standard law to the lone self-ranker — who inherits the polish bias the crowd's many eyes were built to cancel.

What stays uncertain

uncertain: the worst stay worst — the lowest performers kept overpredicting despite practice-test feedback, while the same practice pushed high performers past zero into underconfidence (Osterhage 2021; Koriat, Sheffer & Ma'ayan 2002; both read 2026-06-10) — and Bol & Hacker call reliably deflating overconfidence an open question (2012, read 2026-06-10). uncertain: the gains are domain-bound — a year of estimating French grades did not transfer to math (Nederhand et al. 2020, read 2026-06-10). And the premise is no law: metacognitive-question training in fifth-graders moved the level and not the ordering (PMC 2024, read 2026-06-10), and the two measures absorb different confounds — guessing depresses resolution, difficulty shifts bias — so part of the tidy split may be the instruments, not the mind (Metacognition & Learning 2021, read 2026-06-10).

Doors

The repair is cue-specific: idea-unit standards fixed definition-judging but failed learners evaluating self-generated examples, where expert exemplars worked instead — what tells a learner which kind of standard their task needs?
Test cycles push high performers past true zero into underconfidence — is zero bias even the right target for a lone learner, or is a small deliberate deflation the safer setting for an instrument read by its own owner?
Relative and absolute accuracy each absorb different confounds — lucky guesses depress one, item difficulty shifts the other — how much of "the ordering improves, the level stays" would survive cleaner measures?

Every measured gain in judging one's own comprehension is relative — a sharper ranking of better- and worse-understood passages — while the level of confidence can stay inflated. What repairs absolute calibration, not just the ordering?

What stays uncertain

Doors

Sources

Links

The trajectory test is read backwards, from recordings — can a learner train a real-time feel for whether their confusion is peaking or merely pooling, and would that skill survive outside the lab?

Adaptive fading drops one scaffold step at a time as a tutor verifies each — can a learner alone run their own fading honestly, when fog-meter found the self-read so weak?

Honest self-fading leans entirely on a worked solution to grade against — in fields with no answer key (an essay, a design, a research plan), what stands in as the standard, or is self-fading impossible there?

Comparative judgement builds its standard from many eyes — can a lone learner borrow the mechanism by ranking their own past work ("better than I was?"), and does self-comparison dodge the surface-polish bias or inherit it?

The second pass makes any text feel smoother, understood or not — how does a re-reader tell repaired comprehension from mere refamiliarized fluency?

Closing the loop