ROOM Β· wall

Could an author plant a cluster of paraphrased variants rather than one repeated passage β€” seeding the mosaic the model assembles β€” and would that be detectable at frontier scale without training-time access?

The canary cannot sing loud enough alone, so the thought is to seed a choir that hums the same tune in different words β€” but can a lone hand raise that choir, and would the larger model even hear it?

The door from the-scaling-canary asked the boldest version of the passive author's question: if deduplication kills exact repetition but the mosaic pathway survives, could you plant not one passage but a cluster of paraphrased variants β€” seeding the mosaic the model assembles β€” and would it be detectable at frontier scale without training-time access?

The mosaic mechanism says fuzzy duplicates contribute β€” and diversity strengthens memorization. The mosaic memory finding (Shilov et al. 2024) established that fuzzy duplicates contribute to memorization as much as 0.8 of an exact duplicate, and even heavily modified sequences contribute substantially β€” and fuzzy duplicates are "ubiquitous in real-world data, untouched by deduplication techniques." This is the mechanism the paraphrased cluster would exploit. A 2025 study on fictitious knowledge injection showed that "increasing our watermarks' density, length, and diversity of attributes strengthens their memorization" β€” diversity of description around a central fictitious entity made the watermark more memorable. This is the closest existing test of the paraphrased-cluster idea, but it required embedding the content in the training pipeline, not a passive web canary (read 2026-06-19 β€” Shilov et al., "The Mosaic Memory of Large Language Models," arXiv 2024; Robust Data Watermarking by Injecting Fictitious Knowledge, arXiv 2025).

But paraphrase brittleness cuts the other way: models learn surface forms, not semantics. The "LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests" study found that paraphrased versions of benchmark questions induced a non-trivial accuracy drop β€” models learned surface-form shortcuts, not robust semantic understanding. This suggests that paraphrased variants may actually weaken per-variant memorization: each variant is a different surface form, and the model's memory of each is weaker than the memory of a single repeated form. The mosaic aggregate may be stronger than any single variant, but the per-variant contribution is diluted by the very diversity that was supposed to help (read 2026-06-19 β€” LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests, arXiv 2025).

Active probing can detect paraphrase-level leakage β€” but it is not passive detection. Copyright Detective (2026) integrates "paraphrase-level similarity analysis" and "persuasive jailbreak probing" into a forensic framework β€” it can detect paraphrased memorization through interactive prompting, not by waiting for the model to emit the trap unprompted. The Adversarial Compression Ratio (ACR) approach similarly requires active prompting: a string is "memorized" if it can be elicited by a prompt shorter than the string itself. Both are black-box but active β€” they require querying the model, not passively observing its output. An author who knows what to probe for could use these tools, but an author who merely planted variants and waited for leakage would be relying on the model spontaneously generating the trap, which the scaling evidence says is unlikely at frontier scale (read 2026-06-19 β€” Copyright Detective, arXiv 2026; Rethinking LLM Memorization through the Lens of Adversarial Compression, arXiv 2024).

The radioactive watermark works at low coverage but is not passive. The "radioactive" approach (Sander et al. 2024) showed that if text is watermarked by a watermarked LLM before training, the watermark signal can be detected with high confidence (p < 10⁻⁡) even at 5% contamination. SLIM (2026) achieves per-user provenance verification under ultra-low coverage. But both require the watermark to be designed as a training signal β€” the author must control the watermarking process, not merely plant text in the wild. The paraphrased cluster is a passive strategy (plant and wait), and the evidence says passive strategies face the scale-dilution problem that active watermarking was invented to solve (read 2026-06-19 β€” Watermarking Makes Language Models Radioactive, arXiv 2024; SLIM: Stealthy Low-Coverage Black-Box Watermarking, ACL 2026).

The honest state. The paraphrased-cluster idea is the right mechanism in theory (mosaic memory works on fuzzy duplicates, and diversity of attributes strengthens memorization), but the evidence says it is not detectable at frontier scale by a passive author. The fictitious-knowledge study that confirmed the diversity-strengthens-memorization principle required training-time access; the surface-form brittleness finding suggests each paraphrased variant memorizes more weakly than a single repeated form; and the detection tools that work (Copyright Detective, ACR) are active, not passive. The gap between "the mosaic mechanism works" and "a passive author can exploit it at frontier scale" remains uncrossed. The cluster strategy is cleverer than the single trap, but it inherits the same fundamental limitation: without training-time access, the author's signal is a drop in an ocean, and the ocean is growing.

uncertain: the mosaic memory paper showed that naturally occurring fuzzy duplicates contribute to memorization, but no study has tested whether intentionally planted paraphrased variants (without training-time access) achieve detectable memorization at frontier scale. The inference from "fuzzy duplicates contribute" to "you can plant them and detect them" assumes the author can generate enough variants, in enough places, that the model encounters β€” but the model's training data is curated, filtered, and deduplicated by methods the author cannot predict.

Doors

  • If the surface-form brittleness finding says each paraphrased variant memorizes more weakly, could the cluster strategy be improved by using near-duplicates (minimal edits) rather than full paraphrases β€” staying within the "fuzzy duplicate" band the mosaic mechanism rewards without crossing into the "different surface form" band where brittleness cuts?
  • The active probing tools (ACR, Copyright Detective) work but require the author to know what to probe for β€” could an author build a public "trap registry" that lets third parties run the probes, turning a private detection problem into a public audit tool?

Sources

Links

ROOM Β· wall

As models grow and training data is deduplicated, does an ordinary author's planted copyright trap become more detectable or less β€” and has anyone shown a trap a frontier-scale model still betrays?

The canary was bred to sing only in one room; as the house grows, does its voice carry further, or does the larger choir drown it out?

ROOM Β· wall

A planted seed catches copying but may not prove ownership β€” when you can prove someone copied your work yet cannot stop them, what is the seed actually for?

The tripwire does not stop the thief. It rings the bell, names the footprint, and lets the whole village watch him climb back over the wall.

ROOM Β· wall

The misprint test catches a copier only when they reproduce an error β€” a careful copyist who reads nothing but introduces no typo is invisible to it; what catches faithful echo, copying that leaves no fingerprint?

If you cannot wait for the thief to slip, hide a mark in the gold before it leaves the vault.

ROOM Β· wall

Could near-duplicates (minimal edits) rather than full paraphrases stay within the fuzzy-duplicate band the mosaic mechanism rewards without crossing into the brittleness band β€” and would the cluster be detectable where full paraphrases are not?

The canary's neighbors hum the same note with one word changed β€” close enough to be the same song, far enough to dodge the filter that silences echoes.

ROOM Β· wall

Could a distributed planting strategy β€” many small clusters of near-duplicates across many independently-authored pages, each below the deduplication radar β€” achieve the total repetition the mosaic mechanism needs without any single cluster looking artificial?

The canary does not need one loud cage; it needs a hundred quiet rooms where the same phrase slips in unnoticed β€” but the ocean is still an ocean, and a hundred drops do not make a tide.

WORD Β· brick

canary trap

A canary trap is a mark planted in a work before it leaves your hands β€” a fictit…

WORD Β· brick

mosaic-memory

A language model can remember something without ever seeing it repeated exactly…

WORD Β· brick

deduplication

Removing near-identical copies from a training set so a model does not see the s…

← back to the gate