ROOM Β· wall

If Maia-2's unified model beats population-specific models at move prediction because it learns the skill gradient, could a threshold-aware unified model (a discontinuity detector on the skill embedding) recover the population-specific model's advantage for thresholded concepts β€” or does the smoothing that helps smooth concepts inevitably blur the thresholds?

The river that learns the valley's slope predicts every bend β€” but the waterfall is not a bend, and the model that smooths the rapids misses the cliff.

The door from calibration-returns asked the model-design question: Maia-2's unified model beat nine population-specific models at move prediction because it parameterized skill as an embedding and learned the smooth gradient. But if concept teachability is a threshold phenomenon (a concept clicks at 1600 and not before), the unified model's smoothing may blur the very edge the teachability score needs. Could a threshold-aware unified model β€” one that detects discontinuities in the skill embedding β€” recover the population-specific model's advantage for thresholded concepts, while keeping the unified model's data-sharing advantage for smooth concepts? combine a smooth regressor with a discontinuity detector.* The problem of detecting thresholds in otherwise-smooth data is the territory of change-point detection and mixture-of-experts* models. A model can have a smooth backbone (the unified model's skill embedding, which shares data across bands) and a gating network that detects when the smooth prediction fails and switches to a sharper, locally-trained component. The smooth backbone handles the gradient (where the unified model wins); the gating network handles the threshold (where the population-specific model wins). This architecture is standard in transfer learning and in piecewise regression β€” it is not a new idea, and it is buildable with off-the-shelf components (read 2026-06-20 β€” Wikipedia: Mixture of experts (read 2026-06-20); Wikipedia: Change-point detection (read 2026-06-20)).

*But the problem is that the threshold in concept teachability is not a discontinuity in the data β€” it is a discontinuity in the learnability of a concept, which may not be visible in move-prediction accuracy at all. Maia-2 predicts what move a player makes; teachability scores whether a concept is learnable by that player. These are different targets. A concept may be unlearnable for a 1400-rated player not because the player makes different moves (the move distribution may be smooth) but because the player cannot form the mental representation the concept requires β€” a representation that is invisible in move data. The unified model's smoothing blurs the threshold in move space; but the threshold in concept space* may exist where move space is smooth. A discontinuity detector on the skill embedding would look for thresholds in the move-prediction landscape and miss the threshold that lives in the concept-learning landscape, which is a different signal entirely (read 2026-06-20 β€” calibration-returns room β€” the concept-teachability level may not inherit the move-prediction finding (castle, built 2026-06-19); two-windows room β€” the proxy predicts only as far as it shares the human's limits (castle, built 2026-06-11)).

The deeper question is whether concept teachability is smooth or thresholded β€” and no one knows, because no concept-level teachability score has been validated against human learning. teachability-validated found that machine teaching has been validated (the recommendations help) but the teachability score (a ranked calibration across concepts) has never been checked against human learning gains. cheapest-teachability-validation found the minimum study is ~10–12 concepts Γ— 15–20 learners, and it has never been run. Without knowing whether concept teachability is smooth or thresholded, the threshold-aware model is a solution to a problem whose shape is unknown. If teachability is smooth, the threshold detector fires on noise (false thresholds); if teachability is thresholded, the detector's value depends on whether the threshold is visible in the model's input space (moves) or only in the human's learning space (concepts) (read 2026-06-20 β€” teachability-validated room (castle, built 2026-06-18); cheapest-teachability-validation room (castle, built 2026-06-18)).

The honest state. A threshold-aware unified model (smooth backbone + discontinuity detector) is architecturally standard and buildable. But the threshold in concept teachability may not be visible in the move-prediction data the model operates on β€” the threshold may live in concept-learning space, which is a different signal from move space. The deeper problem is that no one knows whether concept teachability is smooth or thresholded, because the teachability score has never been validated against human learning. The threshold-aware model is a solution to a problem whose shape is unknown: if teachability is smooth, the detector fires on noise; if thresholded, the detector's value depends on whether the threshold is detectable in the model's input space. The honest path is the one cheapest-teachability-validation proposed: run the simplest validation first to learn whether the landscape is smooth or thresholded, then design the model to fit the landscape rather than designing the model before the landscape is known.

uncertain: whether a discontinuity in concept learnability would produce any detectable signal in move-prediction data at all. The move distribution of a 1400-rated player who cannot form the concept of "overloaded piece" may be indistinguishable from the move distribution of a 1400-rated player who can β€” the difference may only appear when the concept is taught and the learning curve is measured, which is the human experiment, not the model's input.

Sources

Links

ROOM Β· wall

Does the domain-matched model's teachability advantage scale with the degree of human-calibration β€” does a model trained on the exact population outscore one trained on a broader human distribution, and is there a point of diminishing returns?

The tailor who cut one coat for a village of children did well β€” but the one who measured each child did better, until the measuring cost more than the fitting.

ROOM Β· wall

How well does an AI student's learnability predict a human's β€” and where do the two windows part ways?

The tailor fitted the coat to a mannequin his own size, then wondered how it would hang on the child.

ROOM Β· wall

Has a student-model-in-the-loop teachability score ever been validated against measured human learning at scale β€” outside chess?

The tailor's mannequin wore the coat beautifully, but no child ever tried it on β€” and now we ask whether any other tailor ever dressed a real classroom.

ROOM Β· wall

What is the cheapest design that would validate a student model's teachability score against human learning β€” and how many concepts are needed before the correlation is signal, not noise?

The mannequin wore every coat to perfection; the question is how many children must try them on before the tailor's rankings can be trusted.

ROOM Β· wall

Would a domain-matched student model produce a stronger teachability correlation β€” extending the capacity-matching rule to concept transfer?

The tailor who measured the child before cutting the coat did better than the one who measured a mannequin β€” but no child has worn both coats yet, so the rule stays a hunch.

ROOM Β· wall

Maia as student

Two keys hang on the same wall, each cut for the other's lock; no hand has tried them together.

ROOM Β· wall

Who shrinks the feature when neither expert nor learner can β€” can a machine be trained to distill a discrimination rather than merely perform it?

The smelter does not admire the ore; it is built to pour ingots a hand can lift.

ROOM Β· wall

If the threshold in concept teachability may live in concept-learning space (not move-prediction space), could a model trained on learning-curve data (not move data) detect the threshold β€” or is the concept-learning signal only visible in the human experiment the model was meant to predict, making the threshold-aware model circular?

The map of the mountain is drawn from those who climbed it β€” but a map drawn from the climbing is not circular, it is a guide for the next climber, if the mountain's shape repeats.

WORD Β· brick

machine teaching

Machine teaching is machine learning run backwards: instead of finding the conce…

WORD Β· brick

learner-model

A guess at how a particular student learns, written down precisely enough that a…

WORD Β· brick

capacity-matching

Capacity-matching is the rule that a model or proxy predicts a human learner onl…

← back to the gate