Natural language watermarking via morphosyntactic alterations

We develop a morphosyntax-based natural language watermarking scheme. In this scheme, a text is first transformed into a syntactic tree diagram where the hierarchies and the functional dependencies are made explicit. The watermarking software then operates on the sentences in syntax tree format and executes binary changes under control of Wordnet and Dictionary to avoid semantic drops. A certain level of security is provided via key-controlled randomization of morphosyntactic tools and the insertion of void watermark. The security aspects and payload aspects are evaluated statistically while the imperceptibility is measured using edit-hit counts based on human judgments. It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of a language, like Turkish, is an extra bonus.

[1]  Xingming Sun,et al.  A Natural Language Watermarking Based on Chinese Syntax , 2005, ICNC.

[2]  Mikhail J. Atallah,et al.  Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation , 2001, Information Hiding.

[3]  E. Delp,et al.  Security, steganography, and watermarking of multimedia contents , 2004 .

[4]  A. Scott,et al.  Ann Arbor , 1980 .

[5]  Rada Mihalcea,et al.  SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text , 2005, ACL.

[6]  Stefan Katzenbeisser,et al.  Content-Aware Steganography: About Lazy Prisoners and Narrow-Minded Wardens , 2006, Information Hiding.

[7]  Mikhail J. Atallah,et al.  Translation-based steganography , 2005, J. Comput. Secur..

[8]  Bülent Sankur,et al.  Syntactic tools for text watermarking , 2007, Electronic Imaging.

[9]  Kemal Oflazer,et al.  The Incremental Use of Morphological Information and Lexicalization in Data-Driven Dependency Parsing , 2006, ICCPOL.

[10]  Sergei Nirenburg,et al.  Natural language processing for information assurance and security: an overview and implementations , 2001, NSPW '00.

[11]  Jiang Wu,et al.  Authorship Proof for Textual Document , 2008, Information Hiding.

[12]  Mark Chapman,et al.  Hiding the Hidden: A software system for concealing ciphertext as innocuous text , 1997, ICICS.

[13]  Kemal Oflazer,et al.  Building a wordnet for Turkish , 2004 .

[14]  Martin F. H. Schuurmans,et al.  Digital watermarking , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[15]  Dilek Z. Hakkani-Tür,et al.  Building a Turkish Treebank , 2003 .

[16]  Kemal Oflazer,et al.  Statistical Dependency Parsing for Turkish , 2006, EACL.

[17]  Kemal Oflazer Dependency Parsing with an Extended Finite State Approach , 1999, ACL.

[18]  D. Slobin,et al.  Children use canonical sentence schemas: A crosslinguistic study of word order and inflections , 1982, Cognition.

[19]  Mikhail J. Atallah,et al.  The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , 2006, MM&Sec '06.

[20]  Mikhail J. Atallah,et al.  Lost in just the translation , 2006, SAC.

[21]  Stefan Katzenbeisser,et al.  Towards Human Interactive Proofs in the Text-Domain (Using the Problem of Sense-Ambiguity for Security) , 2004, ISC.

[22]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[23]  Bernard Comrie,et al.  The World's Major Languages , 1987 .

[24]  Roger K. Moore Computer Speech and Language , 1986 .

[25]  Mikhail J. Atallah,et al.  Information hiding through errors: a confusing approach , 2007, Electronic Imaging.

[26]  Radu Sion,et al.  Natural Language Watermarking and Tamperproofing , 2002, Information Hiding.

[27]  A. Sumru Özsoy,et al.  dA: a focus/topic associated clitic in Turkish , 2003 .

[28]  Mikhail J. Atallah,et al.  Words are not enough: sentence level natural language watermarking , 2006, MCPS '06.

[29]  Christian Cachin,et al.  An information-theoretic model for steganography , 1998, Inf. Comput..

[30]  Edward J. Delp,et al.  Natural language watermarking , 2005, IS&T/SPIE Electronic Imaging.

[31]  Pierre Moulin On the Optimal Structure of Watermark Decoders Under Desynchronization Attacks , 2006, 2006 International Conference on Image Processing.

[32]  Christiane Fellbaum,et al.  Building Semantic Concordances , 1998 .

[33]  Gustavus J. Simmons,et al.  The Prisoners' Problem and the Subliminal Channel , 1983, CRYPTO.

[34]  Richard Bergmair Natural Language Steganography and an \AI-complete" Security Primitive , 2006 .

[35]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[36]  Krista Bennett,et al.  LINGUISTIC STEGANOGRAPHY: SURVEY, ANALYSIS, AND ROBUSTNESS CONCERNS FOR HIDING INFORMATION IN TEXT , 2004 .

[37]  Sergei Nirenburg,et al.  Book Review: Ontological Semantics, by Sergei Nirenburg and Victor Raskin , 2004, CL.

[38]  Richard Bergmair,et al.  A comprehensive bibliography of linguistic steganography , 2007, Electronic Imaging.

[39]  Mohan S. Kankanhalli,et al.  Watermarking of Electronic Text Documents , 2002, Electron. Commer. Res..

[40]  Carl Vogel,et al.  The syntax of concealment: reliable methods for plain text information hiding , 2007, Electronic Imaging.

[41]  Igor A. Bolshakov,et al.  A Method of Linguistic Steganography Based on Collocationally-Verified Synonymy , 2004, Information Hiding.

[42]  Carl Vogel,et al.  Statistically-constrained shallow text marking: techniques, evaluation paradigm and results , 2007, Electronic Imaging.

[43]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .

[44]  Brian Murphy,et al.  Syntactic Information Hiding in Plain Text , 2001 .

[45]  B. Sankur,et al.  Watermarking Tools for Turkish Texts , 2006, 2006 IEEE 14th Signal Processing and Communications Applications.