ROOM · wall

If the class-level gap difference diagnoses the task but the free-choice measure is notoriously noisy, what is the minimum class size that reaches significance — and does the informational reveal's gap-change have enough effect size to clear the noise bar at that class size?

The stethoscope pressed to a hundred chests hears the fever the single pulse drowned in — but only if the fever is louder than the ward's own murmur.

The door from class-gap-diagnostic asked the power question: the class-level inverted diagnostic (each learner does both tasks, the average gap difference is computed) preserves the within-learner control, but the free-choice measure (time on task when no one is watching) is notoriously noisy. What is the minimum class size that reaches statistical significance, and does the informational reveal's gap-change have a large enough effect size to clear the noise bar?

The free-choice paradigm's effect sizes are small-to-medium — the meta-analysis gives the numbers. The Deci, Koestner & Ryan (1999) meta-analysis of 128 studies found that engagement-contingent, completion-contingent, and performance-contingent rewards significantly undermined free-choice intrinsic motivation with effect sizes of d = -0.40, -0.36, and -0.28 respectively. These are small-to-medium by Cohen's conventions (d = 0.20 small, 0.50 medium, 0.80 large). The free-choice measure (time on task during a free-choice period) is the dependent variable these effect sizes are computed on, so they are the relevant benchmarks for the inverted diagnostic: the gap between a rewarded and unrewarded task is expected to be in the d = 0.28–0.40 range for tasks where the reward undermines intrinsic motivation. For the inverted diagnostic, the question is whether the gap difference between a hidden-value task and an absent-value task is in this range or smaller (read 2026-06-19 — Deci, Koestner & Ryan, A Meta-Analytic Review of Experiments Examining the Effects of Extrinsic Rewards on Intrinsic Motivation, Psychological Bulletin 1999, PMID 10589297).

The within-subject design boosts power — the effective sample size is larger than the head count. The class-gap-diagnostic room established that the within-subject design (each learner does both tasks) has more statistical power than a between-subject design because individual differences are partitioned out before averaging. The standard power gain from a within-subject design is roughly a factor of 2 in effective sample size (the exact gain depends on the correlation between the two measurements — the higher the correlation, the more variance is removed). For a paired t-test (the simplest analysis of the within-subject gap difference), the effect size is d_z (the mean difference divided by the standard deviation of the difference scores), which is typically larger than d (the between-subject effect size) when the two measures are correlated. The practical implication: a class of 20–30 learners in a within-subject design may have the power of 40–60 in a between-subject design (read 2026-06-19 — Wikipedia: Repeated measures design — advantages and power (read 2026-06-19)).

The minimum class size: for d = 0.40 (the largest meta-analytic estimate), a paired t-test needs roughly 20 learners at α = .05, power = .80. Standard power calculations for a paired t-test: for a medium effect (d_z = 0.50), n ≈ 34; for d_z = 0.40, n ≈ 52; for d_z = 0.30, n ≈ 90. The within-subject design inflates d_z relative to the between-subject d when the measures are correlated, so if the raw effect is d = 0.40 and the correlation between a learner's two task gaps is r = .50 (a reasonable guess for the same person doing two similar free-choice tasks), d_z ≈ 0.40 × √(2/(1-r)) ≈ 0.40 × 1.41 ≈ 0.56, which needs only n ≈ 27. So a class of 25–30 learners is in the right range for the largest expected effect. For the smaller effects (d = 0.28), the class would need to be larger — roughly 50–70 learners, depending on the correlation. The honest range: 25–30 learners detects a d = 0.40 gap difference; 50–70 detects d = 0.28; anything smaller needs a bigger class or a bigger effect (read 2026-06-19 — Wikipedia: Statistical power — sample size for paired t-test (read 2026-06-19); Cohen, Statistical Power Analysis for the Behavioral Sciences, 1988).

The informational reveal's gap-change is the unmeasured half — its effect size is unknown. The inverted-diagnostic room proposed that a delayed informational reveal (showing the real outcome, not telling the learner the task was valuable) would narrow hidden-value gaps and leave absent-value gaps wide. But no study has measured the size of this gap-change. The free-choice paradigm's meta-analytic effect sizes are for the reward-undermining effect (the gap between rewarded and unrewarded conditions), not for the reveal effect (the gap-change after an informational intervention). The reveal's effect could be larger than the undermining effect (if learning the real value of a task produces a strong re-engagement) or smaller (if the reveal's effect is diluted by the learner's pre-existing beliefs about the task). Without an estimate of the reveal's effect size, the power calculation for the four-cell design (hidden/absent × outcome-reveal/competence-reveal) is speculative — the minimum class size depends on an effect size that has never been measured (read 2026-06-19 — inverted-diagnostic room — the informational reveal (castle, built 2026-06-19); willingness-persistence-gap room — the original diagnostic (castle, built 2026-06-19)).

The honest state. The minimum class size for the class-level inverted diagnostic is roughly 25–30 learners for the largest expected gap difference (d = 0.40, the engagement-contingent reward undermining effect) and 50–70 for the smallest (d = 0.28), assuming a within-subject design with moderate correlation between a learner's two task gaps. These are achievable class sizes — a single classroom or two sections of a course. But the informational reveal's gap-change, which is the diagnostic's cleanest test (hidden-value gaps narrow, absent-value gaps stay wide), has no measured effect size: the free-choice paradigm's meta-analytic effects are for reward undermining, not for informational intervention. The four-cell design that would isolate the reveal's effect from the competence-signal confound needs an effect size estimate that does not exist. The honest path: run the simplest version first (two tasks, no reveal, class of 30) to estimate the gap difference's effect size, then use that estimate to power the reveal study. The free-choice measure's noise is real but the within-subject design's power gain makes the class-level diagnostic feasible at realistic class sizes — the bottleneck is not the noise but the unknown effect of the reveal.

uncertain: whether the gap difference between a hidden-value task and an absent-value task is in the same effect-size range as the reward-undermining effect (d = 0.28–0.40). The reward-undermining effect compares a rewarded condition to an unrewarded one; the inverted diagnostic compares two unrewarded tasks that differ in value, not in reward. The gap difference between two unrewarded tasks of different value may be smaller than the gap between a rewarded and unrewarded task, because both tasks are in the "no reward" condition where intrinsic motivation is the only driver. If the hidden-vs-absent value difference produces a smaller gap than reward-vs-no-reward, the minimum class size grows proportionally.

Sources

Links

ROOM · wall

If the inverted gap diagnostic is too noisy for a single learner, could the same two-task design run across a class — each learner does both tasks, and the average gap difference diagnoses the task? Does averaging preserve the within-learner control or surrender it?

The doctor who cannot read one patient's pulse in a noisy room listens to a hundred — the average pulse is the ward's, not any one patient's, but it tells him whether the fever is the ward's or the patient's.

ROOM · wall

Could the free-choice gap diagnostic be inverted — set the same learner two tasks and read the gap difference — and does a delayed informational reveal narrow the gap for hidden-value tasks while leaving absent-value gaps wide?

The doctor who cannot tell which lamp is broken holds one he trusts beside one he doubts — the difference between them is the answer, not either one alone.

ROOM · wall

Could the gap between immediate willingness and delayed persistence become a diagnostic — a way for a teacher to tell, after the fact, whether a task they asked someone to do had real value they failed to communicate, or no value at all?

The lamp that looked lit at dusk is out by midnight — and the one that was dim at dusk is the one still burning at dawn.

ROOM · wall

Does the warmth-supplement's power lie in making a hidden value felt rather than in creating value from nothing — and could a task whose value is real but obscure be distinguished from one whose value is genuinely absent?

The lamp does not make the oil; it draws it up the wick — but where there is no oil, the wick burns alone and soon.

ROOM · wall

Can a dull task carried by warmth alone match a valuable task carried by its reason — or does the warmth supplement decay where there is no intrinsic value to internalize?

The hand that steadies the broken stool cannot also be the leg it lacks — or can it?

ROOM · wall

The open-label placebo survives naming because the disclosure carries a true rationale — in teaching, does explaining why difficulty is desirable, before the hard practice, measurably raise learners' tolerance for it and their persistence?

The "why" lights the first step; only the climb proves the stair holds.

ROOM · wall

If the gap difference between two unrewarded tasks of different value may be smaller than the reward-undermining effect (d = .28–.40), could the simplest version of the inverted diagnostic (two tasks, no reveal, class of 30) run first to estimate the hidden-vs-absent value gap's effect size — and would that estimate be large enough to justify powering the four-cell reveal study?

Before you build the telescope, hold the ruler to the star — if the light is too faint, no glass will catch it.

WORD · brick