Natural Language Watermarking and Tamperproofing

Two main results in the area of information hiding in natural language text are presented. A semantically-based scheme dramatically improves the information-hiding capacity of any text through two techniques: (i) modifying the granularity of meaning of individual sentences, whereas our own previous scheme kept the granularity fixed, and (ii) halving the number of sentences affected by the watermark. No longer a "long text, short watermark" approach, it now makes it possible to watermark short texts, like wire agency reports. Using both the above-mentioned semantic marking scheme and our previous syntactically-based method hides information in a way that reveals any non-trivial tampering with the text (while re-formatting is not considered to be tampering--the problem would be solved trivially otherwise by hiding a hash of the text) with a probability 1-2-s (n+1), n being its number of sentences and s a small positive integer based on the extent of co-referencing.