Blind Linguistic Steganalysis against Translation Based Steganography

Translation based steganography (TBS) is a kind of relatively new and secure linguistic steganography. It takes advantage of the "noise" created by automatic translation of natural language text to encode the secret information. Up to date, there is little research on the steganalysis against this kind of linguistic steganography. In this paper, a blind steganalytic method, which is named natural frequency zoned word distribution analysis (NFZ-WDA), is presented. This method has improved on a previously proposed linguistic steganalysis method based on word distribution which is targeted for the detection of linguistic steganography like nicetext and texto. The newmethod aims to detect the application of TBS and uses none of the related information about TBS, its only used resource is a word frequency dictionary obtained from a large corpus, or a so called natural frequency dictionary, so it is totally blind. To verify the effectiveness of NFZ-WDA, two experiments with two-class and multi-class SVM classifiers respectively are carried out. The experimental results show that the steganalytic method is pretty promising.

[1]  Liusheng Huang,et al.  STBS: A Statistical Algorithm for Steganalysis of Translation-Based Steganography , 2010, Information Hiding.

[2]  Mark Chapman,et al.  Hiding the Hidden: A software system for concealing ciphertext as innocuous text , 1997, ICICS.

[3]  Mikhail J. Atallah,et al.  Lost in just the translation , 2006, SAC.

[4]  Hao,et al.  Research on Information Hiding , 2006 .

[5]  Mikhail J. Atallah,et al.  Translation-based steganography , 2005, J. Comput. Secur..

[6]  Zhili Chen,et al.  Attacks on Translation Based Steganography , 2009, 2009 IEEE Youth Conference on Information, Computing and Telecommunication.

[7]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[8]  Liusheng Huang,et al.  A Statistical Algorithm for Linguistic Steganography Detection Based on Distribution of Words , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.