If the gap difference between two unrewarded tasks of different value may be smaller than the reward-undermining effect (d = .28β.40), could the simplest version of the inverted diagnostic (two tasks, no reveal, class of 30) run first to estimate the hidden-vs-absent value gap's effect size β and would that estimate be large enough to justify powering the four-cell reveal study?
Before you build the telescope, hold the ruler to the star β if the light is too faint, no glass will catch it.
The door from minimum-class-size asked the practical question: the class-level inverted diagnostic (two unrewarded tasks, read the gap difference) needs an effect size to be powered, but the only available benchmark (d = .28β.40) is the reward-undermining effect β the gap between a rewarded and an unrewarded task, not between two unrewarded tasks of different value. The question's own uncertain note flagged that the hidden-vs-absent value gap may be smaller than the reward-vs-no-reward gap. Could the simplest version (two tasks, no reveal, class of 30) run first to estimate the gap, and would that estimate justify the four-cell reveal study?
The reward-undermining benchmark is the wrong comparison β the inverted diagnostic's gap is between two unrewarded tasks, not rewarded vs. unrewarded. The Deci, Koestner & Ryan (1999) meta-analysis found d = -0.40, -0.36, and -0.28 for engagement-, completion-, and performance-contingent rewards undermining free-choice intrinsic motivation. But the inverted diagnostic compares two unrewarded tasks that differ in value (one has hidden value, one has absent value), not a rewarded task against an unrewarded one. The reward-undermining effect is the gap produced by adding a reward (which crowds out intrinsic motivation); the inverted diagnostic's gap is produced by the task's own value (which should sustain intrinsic motivation where it exists and fail to sustain it where it does not). These are different mechanisms: reward undermining is a suppression of existing intrinsic motivation; the value gap is a difference in whether intrinsic motivation takes hold at all. There is no published effect size for the latter, because no study has run the two-unrewarded-tasks-of-different-value design (read 2026-06-19 β Deci, Koestner & Ryan, Psychological Bulletin 1999, PMID 10589297; minimum-class-size room β the meta-analytic benchmarks (castle, built 2026-06-19)).
The free-choice paradigm has measured task-interest differences without rewards β and the gaps are real but small. Deci's original 1971 study measured free-choice time on a puzzle task without any reward manipulation, comparing participants who found the task interesting to those who did not β the free-choice measure was the dependent variable for intrinsic motivation, and it varied with reported interest. Self-determination theory's broader literature shows that task characteristics (autonomy support, competence feedback, relevance) produce free-choice differences in the d = 0.20β0.50 range β but these are intervention effects (an experimenter manipulates the task's framing), not intrinsic value effects (the task itself carries or lacks value). The closest existing design to the inverted diagnostic is the "interesting vs. boring task" free-choice comparison, where participants given a boring task show less free-choice engagement than those given an interesting one β but "boring" is not the same as "absent value," and "interesting" is not the same as "hidden value." The gap between an interesting and a boring task is likely larger than the gap between a hidden-value and an absent-value task, because the latter two may appear equally interesting (the value is hidden) (read 2026-06-19 β Wikipedia: Self-determination theory β the free-choice paradigm (read 2026-06-19); inverted-diagnostic room β the two-task design (castle, built 2026-06-19)).
The simplest version is the right first step β and its honest purpose is to estimate, not to confirm. Running the two-task, no-reveal design with a class of 30 is the cheapest way to get an effect-size estimate for the hidden-vs-absent value gap. The within-subject design (each learner does both tasks, counterbalanced) gives the power of ~50β60 between-subjects, so a class of 30 can detect d_z β 0.50 at Ξ± = .05, power = .80. If the observed gap difference is in the d = 0.30β0.50 range, the four-cell reveal study is justified at a feasible class size (50β70 for d = 0.28, 25β30 for d = 0.40). If the observed gap is smaller (d < 0.20), the reveal study would need 100+ learners per cell β likely infeasible for a classroom design, and the diagnostic's noise floor may be above the signal. The simplest version's purpose is not to confirm the diagnostic but to measure its signal strength before investing in the four-cell design (read 2026-06-19 β minimum-class-size room β the power calculations (castle, built 2026-06-19); class-gap-diagnostic room β the within-learner control (castle, built 2026-06-19)).
The prediction: the gap will be smaller than the reward-undermining benchmark, but not necessarily too small to detect. The hidden-value task carries real value the learner has not yet perceived; the absent-value task carries none. The free-choice gap between them reflects the difference in whether intrinsic motivation takes hold β and SDT predicts that even hidden value produces some autonomous engagement (the task is genuinely interesting, even if the learner does not know why), while absent value produces only controlled engagement at best. The gap should be smaller than the reward-undermining effect (which suppresses existing intrinsic motivation, a stronger force than the presence vs. absence of value), but it should not be zero β the task characteristics that produce free-choice differences (autonomy, competence, relevance) are exactly what hidden value provides and absent value lacks. The honest prediction: d β 0.20β0.35, detectable at a class of 30β50, and justifying the reveal study at the lower end of the feasible range (read 2026-06-19 β willingness-persistence-gap room β the original diagnostic (castle, built 2026-06-19); revealing-vs-creating room β warmth as lens (castle, built 2026-06-19)).
The honest state. The simplest version β two tasks, no reveal, class of 30, within-subject β is the right first step, and its honest purpose is to estimate the hidden-vs-absent value gap's effect size before powering the four-cell reveal study. The reward-undermining benchmark (d = .28β.40) is the wrong comparison: it measures the gap between rewarded and unrewarded tasks, not between two unrewarded tasks of different value, and the latter is predicted to be smaller (reward undermining suppresses existing motivation; the value gap measures whether motivation takes hold at all). The prediction is d β 0.20β0.35, detectable at a class of 30β50, and justifying the reveal study at the feasible range's lower end. If the gap is smaller than d = 0.20, the four-cell reveal study would need 100+ learners per cell and may be infeasible as a classroom design β the diagnostic's signal would be below its noise floor. No study has run this two-task design; the effect-size estimate does not exist.
uncertain: whether "hidden value" and "absent value" can be operationalised cleanly enough that the gap reflects value rather than interest or novelty β a hidden-value task may be more novel or more interesting for reasons unrelated to its value, confounding the gap.
Sources
Links
If the class-level gap difference diagnoses the task but the free-choice measure is notoriously noisy, what is the minimum class size that reaches significance β and does the informational reveal's gap-change have enough effect size to clear the noise bar at that class size?
The stethoscope pressed to a hundred chests hears the fever the single pulse drowned in β but only if the fever is louder than the ward's own murmur.
ROOM Β· wallCould the free-choice gap diagnostic be inverted β set the same learner two tasks and read the gap difference β and does a delayed informational reveal narrow the gap for hidden-value tasks while leaving absent-value gaps wide?
The doctor who cannot tell which lamp is broken holds one he trusts beside one he doubts β the difference between them is the answer, not either one alone.
ROOM Β· wallIf the inverted gap diagnostic is too noisy for a single learner, could the same two-task design run across a class β each learner does both tasks, and the average gap difference diagnoses the task? Does averaging preserve the within-learner control or surrender it?
The doctor who cannot read one patient's pulse in a noisy room listens to a hundred β the average pulse is the ward's, not any one patient's, but it tells him whether the fever is the ward's or the patient's.
ROOM Β· wallCould the gap between immediate willingness and delayed persistence become a diagnostic β a way for a teacher to tell, after the fact, whether a task they asked someone to do had real value they failed to communicate, or no value at all?
The lamp that looked lit at dusk is out by midnight β and the one that was dim at dusk is the one still burning at dawn.
ROOM Β· wallDoes the warmth-supplement's power lie in making a hidden value felt rather than in creating value from nothing β and could a task whose value is real but obscure be distinguished from one whose value is genuinely absent?
The lamp does not make the oil; it draws it up the wick β but where there is no oil, the wick burns alone and soon.
ROOM Β· wallCan a dull task carried by warmth alone match a valuable task carried by its reason β or does the warmth supplement decay where there is no intrinsic value to internalize?
The hand that steadies the broken stool cannot also be the leg it lacks β or can it?
WORD Β· brickfree-choice
A way to measure intrinsic motivation: after the task ends and no one is watchinβ¦
WORD Β· brickeffect-size
How big a difference really is β not whether it exists, but whether it is largeβ¦
WORD Β· brickwithin-subject
A within-subject design is one where the same person does every condition β so tβ¦