Novel Text Steganography Using Natural Language Processing and Part-of-Speech Tagging

The practice of transmitting secret data by using cover data is called steganography. At present, several versatile steganographic methods are available using different digital objects (e.g. image, audio, video, text, etc.) as cover to hide data. Accordingly, many methods of steganalysis have been explored to reveal statistical anomalies in stego object, through which presence of secret data can be detected. However, applications of steganography and steganalysis techniques are challenging when cover object is text, as text does not have any redundant bits. In this paper, an innovative text steganography approach is proposed which uses natural language text as cover as well as secret message. The concept of shared key is also used here, that holds the count of each parts-of-speech of secret message. This key is RSA encrypted and shared with communicative parties. Stego created by this method also is in natural language text. This method is successful as the stego keeps the original meaning of the text in gross which makes it robust and undetectable. It shows good result in capacity ratio; also, the similarity index has been assessed by Jaro–Winkler distance and Generalized Levenshtein distance.

[1]  Harsh K. Verma,et al.  A high capacity text steganography scheme based on LZW compression and color coding , 2017 .

[2]  Mohammad Hassan Shirali-Shahreza,et al.  Steganography in SMS by Sudoku puzzle , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[3]  Karim Faez,et al.  The capacity of arithmetic compression based text steganography method , 2013, 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP).

[4]  N. Kavitha,et al.  Evaluative Approach towards Text Steganographic Techniques , 2015 .

[5]  Chin-Chen Chang,et al.  Emoticon-based text steganography in chat , 2009, 2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA).

[6]  Jeffrey C. Carver,et al.  Part-of-speech tagging of program identifiers for improved text-based software engineering tools , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[7]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[8]  Krista Bennett,et al.  LINGUISTIC STEGANOGRAPHY: SURVEY, ANALYSIS, AND ROBUSTNESS CONCERNS FOR HIDING INFORMATION IN TEXT , 2004 .

[9]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[10]  Prasanth Kumar Thandra,et al.  Text steganography: a novel character-level embedding algorithm using font attribute , 2016, Secur. Commun. Networks.

[11]  Dilip Kumar Yadav,et al.  A modified approach to data hiding in Microsoft Word documents by change-tracking technique , 2017, J. King Saud Univ. Comput. Inf. Sci..

[12]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[13]  Cheng Juan Research and implementation English Morphological Analysis and Part-of-Speech tagging , 2010, 2010 International Conference on E-Health Networking Digital Ecosystems and Technologies (EDT).

[14]  Axel-Cyrille Ngonga Ngomo,et al.  On the efficient execution of bounded Jaro-Winkler distances , 2016, Semantic Web.

[15]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[16]  David Salomon Data Hiding in Text , 2003 .

[17]  Samayveer Singh,et al.  An Email based high capacity text steganography scheme using combinatorial compression , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).

[18]  Chin-Laung Lei,et al.  Data Hiding on Social Media Communications Using Text Steganography , 2017, CRiSIS.

[19]  Liusheng Huang,et al.  Hiding Information by Context-Based Synonym Substitution , 2009, IWDW.

[20]  Tao Jianchao,et al.  An English Part of Speech Tagging Method Based on Maximum Entropy , 2015, 2015 International Conference on Intelligent Transportation, Big Data and Smart City.

[21]  M.H. Shirali-Shahreza,et al.  Text Steganography in chat , 2007, 2007 3rd IEEE/IFIP International Conference in Central Asia on Internet.

[22]  Maninder Singh Nehra,et al.  ECR (encryption with cover text and reordering) based text steganography , 2013, 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013).

[23]  Fei Wang,et al.  A novel text steganography by context-based equivalent substitution , 2013, 2013 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2013).

[24]  M. Shirali-Shahreza,et al.  Text Steganography by Changing Words Spelling , 2008, 2008 10th International Conference on Advanced Communication Technology.

[25]  Azman Samsudin,et al.  A Framework Components for Natural Language Steganalysis , 2012 .

[26]  Mohammad Hassan Shirali-Shahreza,et al.  A New Synonym Text Steganography , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[27]  Li Yujian,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Srikanth Prabhu,et al.  Information hiding through dynamic text steganography and cryptography: Computing and Informatics , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).