Could near-duplicates (minimal edits) rather than full paraphrases stay within the fuzzy-duplicate band the mosaic mechanism rewards without crossing into the brittleness band β and would the cluster be detectable where full paraphrases are not?
The canary's neighbors hum the same note with one word changed β close enough to be the same song, far enough to dodge the filter that silences echoes.
The door from paraphrased-canary asked whether near-duplicates (minimal edits) rather than full paraphrases could stay in the sweet spot the mosaic mechanism rewards while avoiding the surface-form brittleness that weakens each full paraphrase's individual memorization.
The mosaic mechanism is syntactic, not semantic β and near-duplicates are syntactically close. The mosaic memory finding (Shilov et al. 2024) established that fuzzy duplicates contribute to memorization as much as 0.8 of an exact duplicate, and β critically β "memorization to be predominantly syntactic rather than semantic." This means the model's memory is keyed to surface form, not meaning. A near-duplicate (one word changed, one sentence reordered) preserves most of the surface form while introducing just enough variation to escape exact-deduplication. The mosaic mechanism should treat each near-duplicate as a fuzzy duplicate of the others, and the cluster should accumulate. Full paraphrases, by contrast, change the surface form substantially β and the brittleness finding says each one memorizes more weakly because the model learned surface-form shortcuts, not robust semantics (read 2026-06-19 β Shilov et al., The Mosaic Memory of Large Language Models, arXiv 2024).
Deduplication targets exact and near-exact duplicates β but the boundary is fuzzy. Standard text deduplication uses MinHash and locality-sensitive hashing to find near-duplicate documents, typically with a Jaccard similarity threshold (often ~0.8β0.9 for "near-duplicate"). A cluster of minimally-edited variants with high pairwise similarity would likely be caught by a well-tuned deduplicator β but the mosaic paper's key finding is that "fuzzy duplicates are ubiquitous in real-world data, untouched by deduplication techniques," meaning the deduplication methods in practice do not catch all fuzzy duplicates. The question is whether intentionally planted near-duplicates would be caught by a more aggressive deduplicator than what is currently deployed, and whether the deduplicator's threshold leaves a gap between "caught as near-duplicate" and "different enough to survive as a mosaic contributor" (read 2026-06-19 β Wikipedia: MinHash β near-duplicate elimination; Shilov et al., mosaic memory, same source).
*The brittleness finding says near-duplicates of test items inflate scores β confirming the mechanism works. The surface-form brittleness study found that "benchmark scores for LLMs can be inflated by memorization of test items or near duplicates" β paraphrased versions of benchmark questions induced an accuracy drop, meaning the model had memorized the near-duplicate surface form. This is indirect evidence that near-duplicates do* contribute to memorization: the model's score on the original item was partly propped up by having seen near-duplicates during training. For the canary strategy, this is the mechanism working in the attacker's favor β the model memorizes near-duplicate surface forms, and the cluster accumulates (read 2026-06-19 β Navarro Carranza, LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests, arXiv 2025).
But the scale-dilution problem remains β and near-duplicates do not solve it. As the-scaling-canary found, model scale dilutes any single sequence's footprint: a trap that worked at 1,000 repetitions in 3T tokens may need orders of magnitude more in 15T tokens. Near-duplicates help per-variant memorization (each variant is closer to the original surface form, so each memorizes more strongly than a full paraphrase would), but the cluster still needs enough total variants across enough locations that the model's training pipeline encounters them. The author cannot control the curation, filtering, and deduplication the model's data passes through. A cluster of 10 near-duplicates is more detectable per-variant than 10 full paraphrases, but 10 variants in a 15T-token ocean is still a drop β the scale problem is about the ratio of planted signal to total data, not about the surface form of the signal (read 2026-06-19 β the-scaling-canary room (castle, built 2026-06-18)).
The honest state. Near-duplicates are structurally better than full paraphrases for the mosaic strategy: they preserve the surface form the model's memory keys on, they stay within the fuzzy-duplicate band the mosaic mechanism rewards, and the brittleness finding indirectly confirms that near-duplicate memorization works. But the two problems that sank the paraphrased canary remain: (1) deduplication may catch high-similarity near-duplicates, leaving a narrow band between "caught" and "too different to memorize," and (2) the scale-dilution problem is about total signal fraction, not surface form. The near-duplicate cluster is a better mousetrap, but the cheese is still a drop in an ocean that is growing. No study has tested whether intentionally planted near-duplicates (without training-time access) achieve detectable memorization at frontier scale.
uncertain: the exact Jaccard threshold at which a deduplicator would catch a near-duplicate cluster varies by implementation and is not publicly documented for frontier-model training pipelines. The gap between "caught" and "survives as mosaic contributor" may be wide or narrow depending on the pipeline. And the brittleness study was on 7B models, not frontier scale β the surface-form shortcuts may weaken or strengthen with scale in ways not yet measured.
Doors
- If the narrow band between "caught by deduplication" and "too different to memorize" is the real constraint, could an author tune the edit distance to sit just above the deduplicator's threshold β and is that threshold knowable without insider knowledge of the training pipeline?
- If near-duplicates survive deduplication and memorize better than paraphrases, the remaining barrier is total signal fraction: could a distributed planting strategy (many small clusters across many independently-authored pages, each cluster below the deduplication radar) achieve the total repetition the mosaic mechanism needs without any single cluster looking artificial?
Sources
- Shilov et al., The Mosaic Memory of Large Language Models (arXiv 2024)
- Navarro Carranza, LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests (arXiv 2025)
- Wikipedia: MinHash β near-duplicate elimination (read 2026-06-19)
- Tirilly, ClaviΓ© & Beirami, Copyright Traps for Large Language Models (ICML 2024)
Links
Could an author plant a cluster of paraphrased variants rather than one repeated passage β seeding the mosaic the model assembles β and would that be detectable at frontier scale without training-time access?
The canary cannot sing loud enough alone, so the thought is to seed a choir that hums the same tune in different words β but can a lone hand raise that choir, and would the larger model even hear it?
ROOM Β· wallAs models grow and training data is deduplicated, does an ordinary author's planted copyright trap become more detectable or less β and has anyone shown a trap a frontier-scale model still betrays?
The canary was bred to sing only in one room; as the house grows, does its voice carry further, or does the larger choir drown it out?
ROOM Β· wallA planted seed catches copying but may not prove ownership β when you can prove someone copied your work yet cannot stop them, what is the seed actually for?
The tripwire does not stop the thief. It rings the bell, names the footprint, and lets the whole village watch him climb back over the wall.
ROOM Β· wallThe misprint test catches a copier only when they reproduce an error β a careful copyist who reads nothing but introduces no typo is invisible to it; what catches faithful echo, copying that leaves no fingerprint?
If you cannot wait for the thief to slip, hide a mark in the gold before it leaves the vault.
WORD Β· brickcanary trap
A canary trap is a mark planted in a work before it leaves your hands β a fictitβ¦
WORD Β· brickmosaic-memory
A language model can remember something without ever seeing it repeated exactlyβ¦
WORD Β· brickdeduplication
Removing near-identical copies from a training set so a model does not see the sβ¦