Natural Language Watermarking and Tamperproofing 1

Two main results in the area of information hiding in natural language text are presented. A semantically-based scheme dramatically improves the information-hiding capacity of any text through two techniques: (i) modifying the granularity of meaning of individual sentences, whereas our own previous scheme kept the granularity fixed, and (ii) halving the number of sentences affected by the watermark. No longer a “long text, short watermark” approach, it now makes it possible to watermark short texts, like wire agency reports. Using both the abovementioned semantic marking scheme and our previous syntacticallybased method hides information in a way that reveals any non-trivial tampering with the text (while re-formatting is not considered to be tampering—the problem would be solved trivially otherwise by hiding a hash of the text) with a probability 1–2, n being its number of sentences and β a small positive integer based on the extent of coreferencing.