Empirical Paraphrasing of Modern Greek Text in Two Phases: An Application to Steganography

This paper describes the application of paraphrasing to steganography, using Modern Greek text as the cover medium. Paraphrases are learned in two phases: a set of shallow empirical rules are applied to every input sentence, leading to an initial pool of paraphrases. The pool is then filtered through supervised learning techniques. The syntactic transformations are shallow and require minimal linguistic resources, allowing the methodology to be easily portable to other inflectional languages. A secret key shared between two communicating parties helps them agree on one chosen paraphrase, the presence of which (or not) represents a binary bit of hidden information. The ability to simultaneously apply more than one rules, and each rule more than one times, to an input sentence increases the paraphrase pool size, ensuring thereby steganographic security.

[1]  Efstathios Stamatatos,et al.  A Practical Chunker for Unrestricted Text , 2000, Natural Language Processing.

[2]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[3]  Bülent Sankur,et al.  Syntactic tools for text watermarking , 2007, Electronic Imaging.

[4]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[5]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[6]  Sergei Nirenburg,et al.  Natural language processing for information assurance and security: an overview and implementations , 2001, NSPW '00.

[7]  Edward J. Delp,et al.  Attacks on lexical natural language steganography systems , 2006, Electronic Imaging.

[8]  Zornitsa Kozareva,et al.  Paraphrase Identification on the Basis of Supervised Machine Learning Techniques , 2006, FinTAL.

[9]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[10]  Niels Provos,et al.  Hide and Seek: An Introduction to Steganography , 2003, IEEE Secur. Priv..

[11]  Chris Brockett,et al.  Support Vector Machines for Paraphrase Identification and Corpus Construction , 2005, IJCNLP.

[12]  Dimitris N. Christodoulakis Natural Language Processing - NLP 2000 , 2000 .

[13]  George Carayannis,et al.  Design and Implementation of the Online ILSP Greek Corpus , 2000, LREC.

[14]  Ingemar J. Cox,et al.  Digital Watermarking , 2003, Lecture Notes in Computer Science.

[15]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.