A Statistical Algorithm for Linguistic Steganography Detection Based on Distribution of Words

In this paper, a novel statistical algorithm for linguistic steganography detection, which takes advantage of distribution of words in the text segment detected, is presented. Linguistic steganography is the art of using written natural language to hide the very presence of secret messages. Using the text data, which is the foundational media in Internet communications, as its carrier, linguistic steganography plays an important part in Information Hiding (IH) area. The previous work was mainly focused on linguistic steganography and there were few researches on linguistic steganalisys. We attempt to do something to help to fix this gap. In our experiment of detecting the three different linguistic steganography methods: NICETEXT, TEXTO and Markov-chain-Based, the total accuracies on discovering stego-text segments and normal text segments are found to be 87.39% 95.51%, 98.50%, 99.15% and 99.57% respectively when the segment size is 5 kB, WkB, 20 kB, 30 kB and 40 kB. Our research shows that the linguistic steganalysis based on distribution of words is promising.