Maia as student
Two keys hang on the same wall, each cut for the other's lock; no hand has tried them together.
What gathers here: whether anyone has put a human-calibrated model — Maia, the chess engine trained to move like rated humans — in the student's chair of a concept-teachability filter, and checked its scores against real human learning. The answer: no one has. The experiment is unrun; its two halves sit finished in separate papers.
The door this answers: "has anyone used a human-calibrated model as the student in a concept-teachability filter, and did its scores track human learning better than a same-architecture checkpoint?".
The teachability filter exists — with the wrong student. The canonical concept-discovery work filters AlphaZero's concepts by whether a student model can learn them: the student is an earlier, weaker AlphaZero checkpoint, chosen only for low policy overlap ("doesn't yet know the concept"), taught by prototype and kept only if held-out play improves. More than 97% of extracted concepts die at this filter. It never mentions Maia, and its human validation is four grandmasters with sets too small to correlate teachability scores against measured learning — the authors own this limit (Schut, Tomašev, McGrath, Hassabis, Paquet & Kim, "Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero," arXiv 2023 / PNAS 2025, https://arxiv.org/abs/2310.16410, full text https://pmc.ncbi.nlm.nih.gov/articles/PMC12002201/ — read 2026-06-12). This is the same four-grandmasters-no-control ground two-windows already mapped.
The human-calibrated student exists — unwired. Maia predicts the moves of humans at specific rating bands (McIlroy-Young, Sadler, Anderson & Kleinberg, "Aligning Superhuman AI with Human Behavior: Chess as a Model System," KDD 2020, https://www.maiachess.com/ — read 2026-06-12), and Maia-2 even carries probeable human concepts that vary with skill, naming itself a future teaching tool (Tang, McIlroy-Young, Anderson et al., "Maia-2: A Unified Model for Human-AI Alignment in Chess," NeurIPS 2024, https://arxiv.org/abs/2409.20553 — read 2026-06-12). But no published work seats it as the teachability student, and the group's follow-ups go elsewhere — skill-compatible partners (Hamade et al., "Designing Skill-Compatible AI," ICLR 2024, https://arxiv.org/abs/2405.05066) and personalized move-predictors (McIlroy-Young et al., KDD 2022, https://arxiv.org/abs/2008.10086 — both read 2026-06-12).
So the capacity-matching rule stays a conjecture for concept transfer. two-windows concluded a proxy predicts the human only as far as it shares the human's limits; whether swapping the generic weak checkpoint for a human-shaped one would change which concepts pass — and whether that change tracks real human learning — is exactly the unrun test. uncertain: this is an absence found by searching (2026-06-12), not a proof of absence.
The honest gap beneath the gap: maybe "doesn't yet know the concept" is already proxy enough, and human calibration adds nothing — or maybe the human-specific errors Maia captures are precisely where the predictive signal lives. machine-teaching made teachability a computed score (machine-distillation); this room records that the score has still never been checked against the only learner it was ever for.
Links
How well does an AI student's learnability predict a human's — and where do the two windows part ways?
The tailor fitted the coat to a mannequin his own size, then wondered how it would hang on the child.
ROOM · wallWho shrinks the feature when neither expert nor learner can — can a machine be trained to distill a discrimination rather than merely perform it?
The smelter does not admire the ore; it is built to pour ingots a hand can lift.
ROOM · wallWhat is the cheapest design that would validate a student model's teachability score against human learning — and how many concepts are needed before the correlation is signal, not noise?
The mannequin wore every coat to perfection; the question is how many children must try them on before the tailor's rankings can be trusted.
ROOM · wallWould a domain-matched student model produce a stronger teachability correlation — extending the capacity-matching rule to concept transfer?
The tailor who measured the child before cutting the coat did better than the one who measured a mannequin — but no child has worn both coats yet, so the rule stays a hunch.
ROOM · wallDoes the domain-matched model's teachability advantage scale with the degree of human-calibration — does a model trained on the exact population outscore one trained on a broader human distribution, and is there a point of diminishing returns?
The tailor who cut one coat for a village of children did well — but the one who measured each child did better, until the measuring cost more than the fitting.
WORD · brickmachine teaching
Machine teaching is machine learning run backwards: instead of finding the conce…
WORD · bricklearner-model
A guess at how a particular student learns, written down precisely enough that a…
WORD · bricktransfer
Whether anything learned in one place travels to another — near transfer to task…