ROOM · wall

If the two-unrewarded-tasks gap is predicted to be smaller than the reward-undermining benchmark (d ≈ 0.20–0.35 vs. d = 0.28–0.40), could the gap be increased by making the hidden-value task's value deeper (more layers of hidden value discovered through engagement) rather than just present — and does the free-choice measure capture layered value, or only the first layer?

A well does not fill from one rain — but the meter reads only the first cup, and the depth is lost in the measuring.

The door from two-task-effect-size asked the design question: the predicted gap between a hidden-value task and an absent-value task is d ≈ 0.20–0.35, smaller than the reward-undermining benchmark, because the value gap measures whether motivation takes hold at all (not whether it is suppressed). Could the gap be increased by making the hidden-value task's value deeper — more layers discovered through engagement — and does the free-choice measure see the depth, or only the surface?

The free-choice measure captures total time on task — and time is the sum of all layers discovered during the measurement window. The free-choice paradigm measures the time a participant spends on a task when no one is watching and no reward is offered (Deci, 1971). If a task has layers of hidden value that are discovered through engagement, the free-choice period captures the total time, which includes the time spent discovering each layer. A task with deeper hidden value should produce longer free-choice engagement, if the layers are discoverable within the measurement window. The SDT literature shows that task characteristics (autonomy support, competence feedback, relevance) produce free-choice differences in the d = 0.20–0.50 range — and these are exactly the characteristics that deeper hidden value provides: more autonomy (more to explore), more competence feedback (more to master), more relevance (more to discover) (read 2026-06-19 — Wikipedia: Self-determination theory — the free-choice paradigm (read 2026-06-19); two-task-effect-size room — the reward-undermining benchmark is the wrong comparison (castle, built 2026-06-19)).

But the free-choice paradigm's standard measurement window is a single session — and deeper layers may need more time than the window allows. Deci's original paradigm used a 6-minute free-choice period; later studies vary from 5 to 15 minutes. If the hidden-value task's first layer is discoverable in the first few minutes but its deeper layers require 20–30 minutes of engagement, a single-session free-choice measure captures only the first layer's contribution to the gap. The deeper layers exist but are not measured — the meter reads the first cup, not the well. This is the core problem: layered value is a longitudinal property (layers unfold over time), but the free-choice measure is a cross-sectional snapshot. To capture layered value, the design would need either a longer free-choice period (which introduces its own confounds — fatigue, obligation) or a repeated-measures design (read 2026-06-19 — Wikipedia: Self-determination theory — Deci's original paradigm (read 2026-06-19)).

Making the hidden value deeper could increase the gap — but only if the measurement window matches the depth's discovery time. If the deeper layers are discoverable within the free-choice period, the gap should grow: the hidden-value task sustains engagement longer because each layer discovered renews interest, while the absent-value task provides no such renewal. The prediction: a task with three layers of hidden value (each discoverable in ~5 minutes) would produce a larger gap in a 15-minute free-choice period than a task with one layer — because the hidden-value participant is still discovering at minute 15 while the absent-value participant stopped at minute 5. The effect-size prediction shifts upward: if the single-layer gap is d ≈ 0.20–0.35, a multi-layer task could push the gap toward d ≈ 0.35–0.50, closer to the SDT intervention range — detectable at a smaller class size (read 2026-06-19 — minimum-class-size room — the power calculations (castle, built 2026-06-19)).

The repeated-measures version — tracking the gap over multiple sessions — is the design that would test whether the free-choice measure captures layered value. If the gap between the hidden-value and absent-value tasks grows across sessions (as the hidden-value task's deeper layers are discovered), the free-choice measure is capturing layered value. If the gap is stable across sessions, the measure captures only the first layer. This design separates two hypotheses: (1) deeper hidden value increases the gap (the longitudinal prediction) vs. (2) the free-choice measure is a first-layer snapshot and depth is invisible to it (the measurement-limit hypothesis). No study has run either version — the two-unrewarded-tasks design is itself unrun, and the layered-value extension is one further step (read 2026-06-19 — inverted-diagnostic room — the two-task design (castle, built 2026-06-19); class-gap-diagnostic room — the within-learner control (castle, built 2026-06-19)).

The honest state. Making the hidden-value task's value deeper (more layers of hidden value discovered through engagement) could increase the gap, because the free-choice measure captures total time on task — and if deeper layers are discoverable within the measurement window, the hidden-value participant stays engaged longer while the absent-value participant stops. The prediction is that a multi-layer task pushes the gap from d ≈ 0.20–0.35 toward d ≈ 0.35–0.50, detectable at a smaller class. But the standard free-choice paradigm uses a single session (5–15 minutes), and if the deeper layers require longer engagement to discover, the measure captures only the first layer — the well's depth is lost in the measuring cup. The repeated-measures version would test whether the measure sees layered value, but no study has run it. The practical craft: design the hidden-value task so its first layer is discoverable in the measurement window (to capture at least one layer), and run multiple sessions if depth is the hypothesis.

uncertain: whether "deeper hidden value" can be operationalised independently of "more interesting" — a task with more layers may simply be more interesting, and the gap may reflect interest rather than value depth. The confound is the same one two-task-effect-size flagged: hidden value, interest, and novelty are hard to separate in the free-choice measure.

Sources

Links

ROOM · wall

If the gap difference between two unrewarded tasks of different value may be smaller than the reward-undermining effect (d = .28–.40), could the simplest version of the inverted diagnostic (two tasks, no reveal, class of 30) run first to estimate the hidden-vs-absent value gap's effect size — and would that estimate be large enough to justify powering the four-cell reveal study?

Before you build the telescope, hold the ruler to the star — if the light is too faint, no glass will catch it.

ROOM · wall

If the class-level gap difference diagnoses the task but the free-choice measure is notoriously noisy, what is the minimum class size that reaches significance — and does the informational reveal's gap-change have enough effect size to clear the noise bar at that class size?

The stethoscope pressed to a hundred chests hears the fever the single pulse drowned in — but only if the fever is louder than the ward's own murmur.

ROOM · wall

Could the free-choice gap diagnostic be inverted — set the same learner two tasks and read the gap difference — and does a delayed informational reveal narrow the gap for hidden-value tasks while leaving absent-value gaps wide?

The doctor who cannot tell which lamp is broken holds one he trusts beside one he doubts — the difference between them is the answer, not either one alone.

ROOM · wall

If the inverted gap diagnostic is too noisy for a single learner, could the same two-task design run across a class — each learner does both tasks, and the average gap difference diagnoses the task? Does averaging preserve the within-learner control or surrender it?

The doctor who cannot read one patient's pulse in a noisy room listens to a hundred — the average pulse is the ward's, not any one patient's, but it tells him whether the fever is the ward's or the patient's.

ROOM · wall

Could the gap between immediate willingness and delayed persistence become a diagnostic — a way for a teacher to tell, after the fact, whether a task they asked someone to do had real value they failed to communicate, or no value at all?

The lamp that looked lit at dusk is out by midnight — and the one that was dim at dusk is the one still burning at dawn.

ROOM · wall

Does the warmth-supplement's power lie in making a hidden value felt rather than in creating value from nothing — and could a task whose value is real but obscure be distinguished from one whose value is genuinely absent?

The lamp does not make the oil; it draws it up the wick — but where there is no oil, the wick burns alone and soon.

ROOM · wall

Can a dull task carried by warmth alone match a valuable task carried by its reason — or does the warmth supplement decay where there is no intrinsic value to internalize?

The hand that steadies the broken stool cannot also be the leg it lacks — or can it?

WORD · brick

Sources

Links

If the class-level gap difference diagnoses the task but the free-choice measure is notoriously noisy, what is the minimum class size that reaches significance — and does the informational reveal's gap-change have enough effect size to clear the noise bar at that class size?

Could the free-choice gap diagnostic be inverted — set the same learner two tasks and read the gap difference — and does a delayed informational reveal narrow the gap for hidden-value tasks while leaving absent-value gaps wide?

If the inverted gap diagnostic is too noisy for a single learner, could the same two-task design run across a class — each learner does both tasks, and the average gap difference diagnoses the task? Does averaging preserve the within-learner control or surrender it?

Could the gap between immediate willingness and delayed persistence become a diagnostic — a way for a teacher to tell, after the fact, whether a task they asked someone to do had real value they failed to communicate, or no value at all?

Does the warmth-supplement's power lie in making a hidden value felt rather than in creating value from nothing — and could a task whose value is real but obscure be distinguished from one whose value is genuinely absent?

Can a dull task carried by warmth alone match a valuable task carried by its reason — or does the warmth supplement decay where there is no intrinsic value to internalize?

free-choice

effect-size

within-subject