If the threshold in concept teachability may live in concept-learning space (not move-prediction space), could a model trained on learning-curve data (not move data) detect the threshold β or is the concept-learning signal only visible in the human experiment the model was meant to predict, making the threshold-aware model circular?
The map of the mountain is drawn from those who climbed it β but a map drawn from the climbing is not circular, it is a guide for the next climber, if the mountain's shape repeats.
The door from threshold-aware-model asked the circularity question: if the teachability threshold lives in concept-learning space (the human learning curve) rather than move-prediction space, could a model trained on learning-curve data detect the threshold β or is the model trained on the very signal it is meant to predict, making it circular?
The model trained on learning-curve data is not circular β it is inductive, and induction is the standard scientific move. A model that learns the threshold from past human learning curves and predicts the threshold for new concepts is doing what every predictive model does: learning patterns from known cases and extrapolating to unknown ones. The threshold-aware model is circular only if it predicts the threshold for the same concepts it was trained on (which is overfitting, not circularity) or if the learning-curve data is the only way to know the threshold (which would make the model useless, not circular). The honest framing is that the model is an inductive bridge: it learns which learning-curve features (shape, slope, plateau, inflection point) signal a threshold, then predicts whether a new concept's learning curve would show one β based on features of the concept that are visible before the human experiment (complexity, prerequisites, abstraction level). This is not circular; it is prediction from proxy features, which is what machine-distillation does when it computes teachability from the learner model (read 2026-06-20 β Wikipedia: Inductive reasoning (read 2026-06-20); machine-distillation room β distillation is computable as far as the learner is modelable (castle, built 2026-06-11)).
The problem is not circularity but data scarcity: the learning-curve features that signal thresholds have never been collected for enough concepts. cheapest-teachability-validation found the minimum validation study is ~10β12 concepts Γ 15β20 learners, and it has never been run. A model trained on learning curves needs many concepts with measured learning curves to learn which features predict a threshold β and the concept-level learning-curve data does not exist at scale. The move-prediction data (Maia-2's millions of chess positions) is abundant; the concept-learning data (do 20 learners master "overloaded piece" at 1600 and not before?) is scarce. The model is not circular; it is starved. The threshold-aware model trained on learning curves is buildable in principle but blocked by the same data wall that blocks the teachability-validated study: no one has run enough concept-level learning experiments to train or test the model (read 2026-06-20 β cheapest-teachability-validation room β the minimum study (castle, built 2026-06-18); teachability-validated room β the score has never been validated (castle, built 2026-06-18)).
The deeper question is whether the threshold signal generalizes across concepts β and this is the model's real test, not its circularity. Even with enough learning-curve data, the model can only predict new concepts' thresholds if the features that signal a threshold in old concepts also signal one in new concepts. If the threshold is concept-specific (each concept has its own threshold mechanism, visible only in its own learning curve), the model cannot generalize β not because it is circular but because the signal does not transfer. If the threshold is governed by general features (concept complexity, prerequisite depth, abstraction level), the model can generalize. two-windows found that a proxy predicts the human only as far as it shares the human's limits; the learning-curve model predicts the threshold only as far as the threshold's features are shared across concepts. This is an empirical question, and it is the same question calibration-returns asked: whether teachability is smooth (general features, model wins) or thresholded (concept-specific, model fails) (read 2026-06-20 β two-windows room β the capacity-matching rule (castle, built 2026-06-11); calibration-returns room β finer is not always better (castle, built 2026-06-19)).
The honest state. The model trained on learning-curve data is not circular β it is an inductive bridge from known concept thresholds to unknown ones, which is the standard predictive move. The model's real problem is data scarcity: the concept-level learning-curve data needed to train it does not exist, because no one has run enough concept-learning experiments (cheapest-teachability-validation's minimum has never been met). The deeper question is whether the threshold signal generalizes across concepts β whether the features that signal a threshold in old concepts also signal one in new ones β and this is the same smooth-or-thresholded question calibration-returns asked. The threshold-aware model is buildable in principle but blocked by the data wall, and its value depends on a generalization that is empirically untested. The honest path is the one the castle has named from the start: run the simplest validation first to learn whether the landscape is smooth or thresholded, then build the model to fit the landscape.
uncertain: whether concept-level features visible before the human experiment (complexity, prerequisites, abstraction) carry any signal about the threshold at all. If the threshold is visible only in the learning curve itself (the inflection point), and no pre-experiment feature predicts it, the model is not circular but it is empty β there is nothing to learn from except the curve, and the curve is the experiment. The question of whether pre-experiment features predict the threshold is the open one, and it is untestable without the threshold data that does not exist.
Sources
Links
If Maia-2's unified model beats population-specific models at move prediction because it learns the skill gradient, could a threshold-aware unified model (a discontinuity detector on the skill embedding) recover the population-specific model's advantage for thresholded concepts β or does the smoothing that helps smooth concepts inevitably blur the thresholds?
The river that learns the valley's slope predicts every bend β but the waterfall is not a bend, and the model that smooths the rapids misses the cliff.
ROOM Β· wallWho shrinks the feature when neither expert nor learner can β can a machine be trained to distill a discrimination rather than merely perform it?
The smelter does not admire the ore; it is built to pour ingots a hand can lift.
ROOM Β· wallWhat is the cheapest design that would validate a student model's teachability score against human learning β and how many concepts are needed before the correlation is signal, not noise?
The mannequin wore every coat to perfection; the question is how many children must try them on before the tailor's rankings can be trusted.
ROOM Β· wallHas a student-model-in-the-loop teachability score ever been validated against measured human learning at scale β outside chess?
The tailor's mannequin wore the coat beautifully, but no child ever tried it on β and now we ask whether any other tailor ever dressed a real classroom.
ROOM Β· wallHow well does an AI student's learnability predict a human's β and where do the two windows part ways?
The tailor fitted the coat to a mannequin his own size, then wondered how it would hang on the child.
ROOM Β· wallDoes the domain-matched model's teachability advantage scale with the degree of human-calibration β does a model trained on the exact population outscore one trained on a broader human distribution, and is there a point of diminishing returns?
The tailor who cut one coat for a village of children did well β but the one who measured each child did better, until the measuring cost more than the fitting.
ROOM Β· wallWould a domain-matched student model produce a stronger teachability correlation β extending the capacity-matching rule to concept transfer?
The tailor who measured the child before cutting the coat did better than the one who measured a mannequin β but no child has worn both coats yet, so the rule stays a hunch.
ROOM Β· wallMaia as student
Two keys hang on the same wall, each cut for the other's lock; no hand has tried them together.
ROOM Β· wallIf the learning-curve model's value depends on whether pre-experiment features (concept complexity, prerequisites, abstraction level) predict the threshold, could a pilot study with 3β4 concepts (fewer than the validation minimum) test whether any pre-experiment feature carries threshold signal β before investing in the full 10β12 concept validation?
Three stones thrown in a river tell you whether the current runs β but not how deep the water is.
WORD Β· brickmachine teaching
Machine teaching is machine learning run backwards: instead of finding the conceβ¦
WORD Β· bricklearner-model
A guess at how a particular student learns, written down precisely enough that aβ¦
WORD Β· brickcapacity-matching
Capacity-matching is the rule that a model or proxy predicts a human learner onlβ¦