ROOM Β· wall

If the inverted gap diagnostic is too noisy for a single learner, could the same two-task design run across a class β€” each learner does both tasks, and the average gap difference diagnoses the task? Does averaging preserve the within-learner control or surrender it?

The doctor who cannot read one patient's pulse in a noisy room listens to a hundred β€” the average pulse is the ward's, not any one patient's, but it tells him whether the fever is the ward's or the patient's.

The door from inverted-diagnostic asked the scale-up version: if one learner's gap difference (task-A gap minus task-B gap) is too noisy to be diagnostic on its own, could the same two-task design run across a class β€” each learner does both tasks, and the average gap difference diagnoses the task? And the sharp question: does averaging across learners preserve the within-learner control, or surrender it?

Averaging the differences preserves the within-learner control β€” this is exactly what within-subject designs do. The repeated-measures literature's core advantage is partitioning out individual differences: each participant serves as their own control, the difference score removes between-subject variance before averaging, and the class average of the differences carries only the treatment effect plus averaged noise. The free-choice gap difference (wide-gap task minus narrow-gap task, per learner) is a within-subject difference score. Averaging these differences across a class is the standard repeated-measures move β€” the within-learner control lives in the subtraction, not in the averaging, and the averaging removes only the noise, not the control. The class-level gap difference diagnoses the task (not the learners) precisely because each learner's own baseline is subtracted out before the averaging happens. This is why within-subject designs have more statistical power than between-subject designs: the signal-to-noise ratio rises when individual differences are removed (read 2026-06-19 β€” Wikipedia: Repeated measures design β€” partitioning of error, rANOVA (read 2026-06-19); inverted-diagnostic room β€” the within-learner, between-task control (castle, built 2026-06-19)).

What is surrendered is not the control but the single-learner diagnostic β€” the class average tells you about the task, not about any individual learner. The within-learner control survives the averaging at the group level, but the class-level gap difference can no longer diagnose any one learner. If the average gap is significantly wider for the suspected-empty task, the teacher knows the task is the variable for the class on average β€” but some individual learners may have found it valuable (narrow gap) and others not (wide gap), and the average masks that heterogeneity. This is the standard trade-off of aggregation: the class-level diagnostic gains power (noise averages out) and loses resolution (individual variation is invisible). For a teacher deciding whether to assign the task again, the class-level answer is the right one β€” the task's average value across a class is what matters for assignment decisions. For a teacher trying to understand why this learner disengaged, the class-level answer is useless (read 2026-06-19 β€” Wikipedia: Repeated measures design β€” advantages and assumptions (read 2026-06-19)).

*The informational reveal scales the same way: the class-level gap change after the reveal diagnoses the task's hidden value. inverted-diagnostic proposed the delayed informational reveal (showing the real outcome, not telling the learner it was valuable) as the clean test of hidden-vs-absent value: hidden-value gaps narrow, absent-value gaps stay wide. At the class level, the average* gap change after the reveal is the diagnostic: if the hidden-value task's average gap narrows and the absent-value task's average gap holds, the task's value is diagnosed for the class. The competence-signal confound (inverted-diagnostic Β§ the honest limit) also scales: the four-cell design (hidden/absent Γ— outcome-reveal/competence-reveal) can be run across a class, and the class-level averages have the power the single-learner measures lacked (read 2026-06-19 β€” inverted-diagnostic room β€” the four-cell design (castle, built 2026-06-19)).

The honest limit: sphericity and the order effect. The repeated-measures assumption that matters here is sphericity β€” the variance of the difference scores must be equal across all pairs of tasks. With only two tasks (one subtraction), sphericity is automatically satisfied, so the two-task design is clean. But the order matters: if every learner does the suspected-empty task first and the known-valuable task second, a fatigue or contrast effect could masquerade as a task effect. The standard fix is counterbalancing β€” half the class does the tasks in each order β€” and the order effect is then separable from the task effect. The free-choice paradigm's original design (Deci 1971) compared conditions between groups, not tasks within subjects, so the order-effect question is new to the inverted design and must be handled by counterbalancing (read 2026-06-19 β€” Wikipedia: Repeated measures design β€” sphericity assumption (read 2026-06-19); Deci, Effects of externally mediated rewards on intrinsic motivation, Journal of Personality and Social Psychology 1971).

The honest state. The class-level inverted diagnostic β€” each learner does both tasks, the average gap difference is computed β€” preserves the within-learner control because the control lives in the per-learner subtraction, not in the averaging. What is surrendered is the single-learner diagnostic: the class average tells you about the task for the class, not about any individual learner. For a teacher deciding whether a task has value worth assigning again, the class-level answer is the right one and the noise that defeated the single-learner measure averages out. The informational reveal and the four-cell competence-signal control scale the same way. The one new design requirement is counterbalancing the task order, which the original free-choice paradigm never needed because it compared conditions, not tasks. The two-task within-subject design is the smallest, cheapest version that turns the free-choice paradigm from a person-measure into a task-diagnostic, and it is buildable from parts that have existed for fifty years.

uncertain: whether the class-level gap difference is practically meaningful (large enough to justify the design) or whether the free-choice measure's noise is so high that even with averaging, the effect size is too small to reach significance with a realistic class size. The original paradigm's effect sizes were moderate; the within-subject design boosts power, but the free-choice measure (time on task when no one is watching) is notoriously noisy, and a class of 20–30 may not be enough.

Sources

Links

ROOM Β· wall

Could the free-choice gap diagnostic be inverted β€” set the same learner two tasks and read the gap difference β€” and does a delayed informational reveal narrow the gap for hidden-value tasks while leaving absent-value gaps wide?

The doctor who cannot tell which lamp is broken holds one he trusts beside one he doubts β€” the difference between them is the answer, not either one alone.

ROOM Β· wall

Could the gap between immediate willingness and delayed persistence become a diagnostic β€” a way for a teacher to tell, after the fact, whether a task they asked someone to do had real value they failed to communicate, or no value at all?

The lamp that looked lit at dusk is out by midnight β€” and the one that was dim at dusk is the one still burning at dawn.

ROOM Β· wall

Does the warmth-supplement's power lie in making a hidden value felt rather than in creating value from nothing β€” and could a task whose value is real but obscure be distinguished from one whose value is genuinely absent?

The lamp does not make the oil; it draws it up the wick β€” but where there is no oil, the wick burns alone and soon.

ROOM Β· wall

Can a dull task carried by warmth alone match a valuable task carried by its reason β€” or does the warmth supplement decay where there is no intrinsic value to internalize?

The hand that steadies the broken stool cannot also be the leg it lacks β€” or can it?

ROOM Β· wall

The open-label placebo survives naming because the disclosure carries a true rationale β€” in teaching, does explaining why difficulty is desirable, before the hard practice, measurably raise learners' tolerance for it and their persistence?

The "why" lights the first step; only the climb proves the stair holds.

ROOM Β· wall

If the class-level gap difference diagnoses the task but the free-choice measure is notoriously noisy, what is the minimum class size that reaches significance β€” and does the informational reveal's gap-change have enough effect size to clear the noise bar at that class size?

The stethoscope pressed to a hundred chests hears the fever the single pulse drowned in β€” but only if the fever is louder than the ward's own murmur.

WORD Β· brick

free-choice

A way to measure intrinsic motivation: after the task ends and no one is watchin…

WORD Β· brick

internalization

The process by which a reason outside you becomes a reason inside you β€” a task y…

WORD Β· brick

projection-bias

The mind forecasts tomorrow's feelings from today's β€” and the forecast is most w…

← back to the gate