论文信息 - The Secret's in the Word Order: Text-to-Text Generation for Linguistic Steganography - 字舞流文

The Secret's in the Word Order: Text-to-Text Generation for Linguistic Steganography

Linguistic steganography is a form of covert communication using natural language to conceal the existence of the hidden message, which is usually achieved by systematically making changes to a cover text. This paper proposes a linguistic steganography method using word ordering as the linguistic transformation. We show that the word ordering technique can be used in conjunction with existing translation-based embedding algorithms. Since unnatural word orderings would arouse the suspicion of third parties and diminish the security of the hidden message, we develop a method using a maximum entropy classifier to determine the naturalness of sentence permutations. The classifier is evaluated by human judgements and compared with a baseline method using the Google n-gram corpus. The results show that our proposed system can achieve a satisfactory security level and embedding capacity for the linguistic steganography application.

Stephen Clark | Ching-Yun Chang

[1] Christopher D. Manning,et al. The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[2] Mikhail J. Atallah,et al. Words are not enough: sentence level natural language watermarking , 2006, MCPS '06.

[3] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[4] Aoife Cahill,et al. Human Evaluation of a German Surface Realisation Ranker , 2009, EACL.

[5] Mi-Young Kim. Natural Language Watermarking for Korean Using Adverbial Displacement , 2008, 2008 International Conference on Multimedia and Ubiquitous Engineering (mue 2008).

[6] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[7] Liusheng Huang,et al. STBS: A Statistical Algorithm for Steganalysis of Translation-Based Steganography , 2010, Information Hiding.

[8] Mikhail J. Atallah,et al. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , 2006, MM&Sec '06.

[9] Mikhail J. Atallah,et al. Lost in just the translation , 2006, SAC.

[10] Carl Vogel,et al. Statistically-constrained shallow text marking: techniques, evaluation paradigm and results , 2007, Electronic Imaging.

[11] Josef van Genabith,et al. Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation , 2008, COLING 2008.

[12] Stephan Oepen,et al. Statistical Ranking in Tactical Generation , 2006, EMNLP.

[13] Xingming Sun,et al. A Natural Language Watermarking Based on Chinese Syntax , 2005, ICNC.

[14] James R. Curran,et al. Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[15] Brian Murphy,et al. Syntactic Information Hiding in Plain Text , 2001 .

[16] Jun'ichi Tsujii,et al. Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.

[17] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[18] Mikhail J. Atallah,et al. Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation , 2001, Information Hiding.

[19] Diana Inkpen,et al. Real-Word Spelling Correction using Google Web 1T 3-grams , 2009, EMNLP.

[20] Bülent Sankur,et al. Syntactic tools for text watermarking , 2007, Electronic Imaging.

[21] Stephen Clark,et al. Syntax-Based Grammaticality Improvement using CCG and Guided Search , 2011, EMNLP.

[22] Radu Sion,et al. Natural Language Watermarking and Tamperproofing , 2002, Information Hiding.

[23] Yun Q. Shi,et al. LinL: Lost in n-best List , 2011, Information Hiding.

[24] Christopher D. Manning,et al. Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[25] Stephen Wan,et al. Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model , 2009, EACL.

[26] Jessica Fridrich,et al. Steganography in Digital Media: References , 2009 .

[27] Stephen Clark,et al. Syntax-Based Word Ordering Incorporating a Large-Scale Language Model , 2012, EACL.

[28] Mi-Young Kim,et al. Natural Language Watermarking by Morpheme Segmentation , 2009, 2009 First Asian Conference on Intelligent Information and Database Systems.

[29] Mark Johnson,et al. Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[30] Benoit M. Macq,et al. A method of text watermarking using presuppositions , 2007, Electronic Imaging.

[31] Adwait Ratnaparkhi,et al. A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[32] David A. Huffman,et al. A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[33] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[34] Liusheng Huang,et al. Blind Linguistic Steganalysis against Translation Based Steganography , 2010, IWDW.

[35] Dilek Z. Hakkani-Tür,et al. Natural language watermarking: challenges in building a practical system , 2006, Electronic Imaging.

[36] Erik Velldal,et al. Empirical Realization Ranking , 2009 .

[37] Carl Vogel,et al. The syntax of concealment: reliable methods for plain text information hiding , 2007, Electronic Imaging.

[38] Igor A. Bolshakov,et al. A Method of Linguistic Steganography Based on Collocationally-Verified Synonymy , 2004, Information Hiding.

[39] James R. Curran,et al. Classification of Verb Particle Constructions with the Google Web1T Corpus , 2008, ALTA.

[40] Stephen Clark,et al. Linguistic Steganography Using Automatically Generated Paraphrases , 2010, NAACL.

[41] Adwait Ratnaparkhi,et al. Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[42] Randy Goebel,et al. Web-Scale N-gram Models for Lexical Disambiguation , 2009, IJCAI.

[43] Stephen Clark,et al. Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding , 2010, EMNLP.

[44] Edward J. Delp,et al. Natural language watermarking , 2005, IS&T/SPIE Electronic Imaging.

[45] Michael White,et al. Minimal Dependency Length in Realization Ranking , 2012, EMNLP.

[46] Tom M. Mitchell,et al. Data Analysis Project: Leveraging Massive Textual Corpora Using n-Gram Statistics , 2008 .

[47] Bülent Sankur,et al. Natural language watermarking via morphosyntactic alterations , 2009, Comput. Speech Lang..

[48] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[49] James R. Curran,et al. Investigating GIS and Smoothing for Maximum Entropy Taggers , 2003, EACL.

[50] Sergei Nirenburg,et al. Natural language processing for information assurance and security: an overview and implementations , 2001, NSPW '00.

[51] Mark Chapman,et al. Hiding the Hidden: A software system for concealing ciphertext as innocuous text , 1997, ICICS.