Neural Linguistic Steganography

Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal. Language is a particularly pragmatic cover signal due to its benign occurrence and independence from any one medium. Traditionally, linguistic steganography systems encode secret messages in existing text via synonym substitution or word order rearrangements. Advances in neural language models enable previously impractical generation-based techniques. We propose a steganography technique based on arithmetic coding with large-scale neural language models. We find that our approach can generate realistic looking cover sentences as evaluated by humans, while at the same time preserving security by matching the cover message distribution with the language model distribution.

[1]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[2]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[3]  Frank Rubin,et al.  Arithmetic stream coding using fixed precision registers , 1979, IEEE Trans. Inf. Theory.

[4]  Liusheng Huang,et al.  Steganalysis of Synonym-Substitution Based Natural Language Watermarking , 2009 .

[5]  Alex Wilson,et al.  Avoiding detection on twitter: embedding strategies for linguistic steganography , 2016, Media Watermarking, Security, and Forensics.

[6]  Yong-Feng Huang,et al.  RNN-Stega: Linguistic Steganography Based on Recurrent Neural Networks , 2019, IEEE Transactions on Information Forensics and Security.

[7]  Arvind Kumar,et al.  Steganography- A Data Hiding Technique , 2010 .

[8]  M. Shirali-Shahreza,et al.  Text Steganography in SMS , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[9]  Te Sun Han Folklore in source coding: information-spectrum approach , 2005, IEEE Transactions on Information Theory.

[10]  Katerina J. Argyraki,et al.  Generating Steganographic Text with LSTMs , 2017, ACL.

[11]  Krista Bennett,et al.  LINGUISTIC STEGANOGRAPHY: SURVEY, ANALYSIS, AND ROBUSTNESS CONCERNS FOR HIDING INFORMATION IN TEXT , 2004 .

[12]  Phil Sallee,et al.  Model-Based Steganography , 2003, IWDW.

[13]  Kevin Knight How Much Information Does a Human Translator Add to the Original? , 2015, EMNLP.

[14]  Falcon Z. Dai,et al.  Towards Near-imperceptible Steganographic Text , 2019, ACL.

[15]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[16]  Ingemar J. Cox,et al.  Information Transmission and Steganography , 2005, IWDW.

[17]  Wei Hao,et al.  Reversible Natural Language Watermarking Using Synonym Substitution and Arithmetic Coding , 2018 .

[18]  Peng Liu,et al.  A Novel Linguistic Steganography Based on Synonym Run-Length Encoding , 2017, IEICE Trans. Inf. Syst..

[19]  Andreas Pfitzmann,et al.  Attacks on Steganographic Systems , 1999, Information Hiding.

[20]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[21]  Subariah Ibrahim,et al.  Information hiding using steganography , 2003, 4th National Conference of Telecommunication Technology, 2003. NCTT 2003 Proceedings..