Natural Language Watermarking and Robust Hashing Based on Presuppositional Analysis

We propose a method of text watermarking and hashing based on natural-language semantic structures. In particular, we are interested in the linguistic semantic phenomenon of presupposition. Presupposition is implicit information that is taken for granted by the reader and establishes common ground between the author's and reader's situational knowledge; it is a semantic component of certain linguistic expressions (lexical items and syntactic constructions called presupposition triggers). The same sentence can be used with or without presupposition, provided that all the relations between discourse referents are preserved. The number of presuppositions in randomly grouped sentences and the web of resolved presupposed information in the text holds the watermark (e.g. integrity watermark, or prove of ownership), introducing "secret ordering" into the text structure to make it resilient to a certain amount of data altering attacks. This intrinsic structure of the text can be also used as a robust hash of the text.