Adjective Deletion for Linguistic Steganography and Secret Sharing

This paper describes two methods for checking the acceptability of adjective deletion in noun phrases. The first method uses the Google n-gram corpus to check the fluency of the remaining context after an adjective is removed. The second method trains an SVM model using n-gram counts and other measures to classify deletable and undeletable adjectives in context. Both methods are evaluated against human judgements of sentence naturalness. The application motivating our interest in adjective deletion is data hiding, in particular linguistic steganography. We demonstrate the proposed adjective deletion technique can be integrated into an existing stegosystem, and in addition we propose a novel secret sharing scheme based on adjective

[1]  Jessica Fridrich,et al.  Steganography in Digital Media: References , 2009 .

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Mi-Young Kim,et al.  Natural Language Watermarking by Morpheme Segmentation , 2009, 2009 First Asian Conference on Intelligent Information and Database Systems.

[4]  Carl Vogel,et al.  The syntax of concealment: reliable methods for plain text information hiding , 2007, Electronic Imaging.

[5]  G. R. Blakley,et al.  Safeguarding cryptographic keys , 1899, 1979 International Workshop on Managing Requirements Knowledge (MARK).

[6]  Randy Goebel,et al.  Web-Scale N-gram Models for Lexical Disambiguation , 2009, IJCAI.

[7]  Brian Murphy,et al.  Syntactic Information Hiding in Plain Text , 2001 .

[8]  Stephen Clark,et al.  Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding , 2010, EMNLP.

[9]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[10]  Edward J. Delp,et al.  Natural language watermarking , 2005, IS&T/SPIE Electronic Imaging.

[11]  Dilek Z. Hakkani-Tür,et al.  Natural language watermarking: challenges in building a practical system , 2006, Electronic Imaging.

[12]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[13]  Bülent Sankur,et al.  Syntactic tools for text watermarking , 2007, Electronic Imaging.

[14]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[15]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[16]  Mirella Lapata,et al.  Sentence Compression Beyond Word Deletion , 2008, COLING.

[17]  Carl Vogel,et al.  Statistically-constrained shallow text marking: techniques, evaluation paradigm and results , 2007, Electronic Imaging.

[18]  Igor A. Bolshakov,et al.  A Method of Linguistic Steganography Based on Collocationally-Verified Synonymy , 2004, Information Hiding.

[19]  James J. Park,et al.  Multimedia and Ubiquitous Engineering , 2014 .

[20]  Juri Ganitkevitch,et al.  Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation. , 2011, EMNLP.

[21]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[22]  Sergei Nirenburg,et al.  Natural language processing for information assurance and security: an overview and implementations , 2001, NSPW '00.

[23]  Mark Chapman,et al.  Hiding the Hidden: A software system for concealing ciphertext as innocuous text , 1997, ICICS.

[24]  Xingming Sun,et al.  A Natural Language Watermarking Based on Chinese Syntax , 2005, ICNC.

[25]  Mikhail J. Atallah,et al.  Words are not enough: sentence level natural language watermarking , 2006, MCPS '06.

[26]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[27]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[28]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[29]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[30]  Mikhail J. Atallah,et al.  Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation , 2001, Information Hiding.

[31]  Benoit M. Macq,et al.  A method of text watermarking using presuppositions , 2007, Electronic Imaging.

[32]  Michael Strube,et al.  Dependency Tree Based Sentence Compression , 2008, INLG.

[33]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[34]  Bülent Sankur,et al.  Natural language watermarking via morphosyntactic alterations , 2009, Comput. Speech Lang..

[35]  Mi-Young Kim Natural Language Watermarking for Korean Using Adverbial Displacement , 2008, 2008 International Conference on Multimedia and Ubiquitous Engineering (mue 2008).

[36]  Mikhail J. Atallah,et al.  The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , 2006, MM&Sec '06.

[37]  Mikhail J. Atallah,et al.  Lost in just the translation , 2006, SAC.

[38]  Stephen Clark,et al.  Linguistic Steganography Using Automatically Generated Paraphrases , 2010, NAACL.

[39]  Radu Sion,et al.  Natural Language Watermarking and Tamperproofing , 2002, Information Hiding.