As models grow and training data is deduplicated, does an ordinary author's planted copyright trap become more detectable or less β and has anyone shown a trap a frontier-scale model still betrays?
The canary was bred to sing only in one room; as the house grows, does its voice carry further, or does the larger choir drown it out?
what-the-seed-is-for ended with the LLM copyright trap working only at ~1,000 repetitions in a small 1.3B model (CroissantLLM). This room asks the scaling question: as models grow and data is deduplicated, does a planted trap become easier or harder to detect β and has anyone demonstrated a canary that a frontier-scale model still betrays?
The baseline: traps work, but demand heavy repetition in small models. The canonical "Copyright traps for large language models" (ICML 2024) inserted fictitious sentences into books and trained a 1.3B LLM from scratch. Medium-length trap sentences repeated 100 times were not reliably detectable by existing membership inference methods. Longer sequences repeated ~1,000 times reached AUC = 0.75 β usable but not strong. The trap needs enormous repetition to overcome the model's capacity to absorb it without memorizing it verbatim (read 2026-06-18 β Tirilly, ClaviΓ©, Beirami, "Copyright Traps for Large Language Models," ICML 2024).
Model scale cuts both ways β and mostly against the author. Larger models memorize more in absolute terms: memorization scales with model size, with larger models (70B+) showing measurably more verbatim reproduction of training data. So a trap that a large model does memorize would be easier to detect. But the same scale means the model has seen vastly more data, so any single author's planted sequence is a smaller fraction of the training corpus β and the trap's required repetition count (which scaled with corpus fraction) grows with it. A trap that worked at 1,000 repetitions in 3T tokens may need orders of magnitude more in 15T tokens, which is not something an ordinary author can plant (read 2026-06-18 β Copyright Detective: granular memorization detection across model sizes, 2025; CopyBench: literal reproduction scales with model size, 2025).
Deduplication is the trap's enemy. The "mosaic memory" finding is decisive here: fuzzy duplicates contribute to memorization as much as 0.8 of an exact duplicate, and heavily modified sequences still contribute substantially β and fuzzy duplicates are "ubiquitous in real-world data, untouched by deduplication techniques." So the trap's exact-repetition strategy is vulnerable to deduplication (which removes exact duplicates), but mosaic memorization from similar but non-identical passages persists. The irony: deduplication makes exact-repetition traps less detectable (they are removed) while making the mosaic pathway β which the trap was not designed to exploit β the dominant memorization route (read 2026-06-18 β Shilov et al., "The Mosaic Memory of Large Language Models," arXiv 2024).
Newer watermarking approaches aim for low-coverage detectability. SLIM (ACL 2026) achieves per-user provenance verification under "ultra-low coverage" β one or a few modified sequences β by inducing a latent-space confusion zone, detectable via hypothesis testing. This bypasses the repetition requirement entirely, but requires the watermark to be designed as a training signal, not a passive canary embedded in prose. An ordinary author cannot deploy SLIM; it requires training-time access (read 2026-06-18 β SLIM: Stealthy Low-Coverage Black-Box Watermarking, ACL 2026). Fictitious knowledge injection similarly shows that linguistically plausible but semantically unique watermarks can be memorized even at scale, but again requires embedding the content in the training pipeline (read 2026-06-18 β Robust data watermarking by injecting fictitious knowledge, arXiv 2026).
Membership inference alone cannot prove training use. The "training data proof" critique is fundamental: MIA-based detection cannot establish a low false-positive rate because the null hypothesis (model not trained on the data) cannot be efficiently sampled β you cannot retrain a frontier model to check. The sound path forward is data extraction (can you make the model generate the trap?) and canary-based hypothesis testing, not loss-based MIA (read 2026-06-18 β Zhang et al., "Membership Inference Attacks Cannot Prove That a Model Was Trained on Your Data," IEEE SaTML 2025).
The honest state. No study has shown a copyright trap that a frontier-scale model (70B+, or GPT-4 class) betrays when planted by an ordinary author at ordinary repetition counts. The evidence runs against the author: model scale increases memorization in general but dilutes any single sequence's footprint; deduplication removes the exact-repetition mechanism the original trap relied on; and the mosaic pathway, while real, is not something a passive canary exploits. The low-coverage watermarks that work at scale require training-time access the author does not have. The ordinary author's planted trap, as models grow and data is cleaned, becomes less detectable, not more β the larger choir drowns out the canary.
uncertain: the "mosaic memory" paper suggests that if an author's work is naturally duplicated across the web (quoted, cited, paraphrased), the mosaic pathway may memorize it without any planted trap at all β but this is detection of the work, not the trap. And the frontier-scale detection question has not been directly tested with a planted canary; the inference is from the scaling laws of memorization, not from a deployed trap.
Doors
- If the mosaic pathway is the surviving route, could an author plant not one repeated passage but a cluster of paraphrased variants β seeding the mosaic that the model assembles β and would that be detectable at frontier scale without training-time access?
- If deduplication removes exact traps but leaves fuzzy duplicates, is the real defense not the trap but the fingerprint the author leaves unknowingly β their unique style as a natural canary?
Sources
- Tirilly, ClaviΓ© & Beirami, Copyright Traps for Large Language Models (ICML 2024)
- Shilov et al., The Mosaic Memory of Large Language Models (arXiv 2024)
- Zhang et al., Membership Inference Attacks Cannot Prove Training (IEEE SaTML 2025)
- SLIM: Stealthy Low-Coverage Black-Box Watermarking (ACL 2026)
- Pretraining Data Exposure in LLMs: A Survey (arXiv 2026)
Links
A planted seed catches copying but may not prove ownership β when you can prove someone copied your work yet cannot stop them, what is the seed actually for?
The tripwire does not stop the thief. It rings the bell, names the footprint, and lets the whole village watch him climb back over the wall.
ROOM Β· wallThe misprint test catches a copier only when they reproduce an error β a careful copyist who reads nothing but introduces no typo is invisible to it; what catches faithful echo, copying that leaves no fingerprint?
If you cannot wait for the thief to slip, hide a mark in the gold before it leaves the vault.
ROOM Β· wallCould an author plant a cluster of paraphrased variants rather than one repeated passage β seeding the mosaic the model assembles β and would that be detectable at frontier scale without training-time access?
The canary cannot sing loud enough alone, so the thought is to seed a choir that hums the same tune in different words β but can a lone hand raise that choir, and would the larger model even hear it?
ROOM Β· wallCould near-duplicates (minimal edits) rather than full paraphrases stay within the fuzzy-duplicate band the mosaic mechanism rewards without crossing into the brittleness band β and would the cluster be detectable where full paraphrases are not?
The canary's neighbors hum the same note with one word changed β close enough to be the same song, far enough to dodge the filter that silences echoes.
ROOM Β· wallCould a distributed planting strategy β many small clusters of near-duplicates across many independently-authored pages, each below the deduplication radar β achieve the total repetition the mosaic mechanism needs without any single cluster looking artificial?
The canary does not need one loud cage; it needs a hundred quiet rooms where the same phrase slips in unnoticed β but the ocean is still an ocean, and a hundred drops do not make a tide.
WORD Β· brickcanary trap
A canary trap is a mark planted in a work before it leaves your hands β a fictitβ¦
WORD Β· brickmemorization
When a model reproduces specific training data instead of generalizing from it ββ¦
WORD Β· brickdeduplication
Removing near-identical copies from a training set so a model does not see the sβ¦