UniSpaCh: A text-based data hiding method using Unicode space characters

This paper proposes a text-based data hiding method to insert external information into Microsoft Word document. First, the drawback of low embedding efficiency in the existing text-based data hiding methods is addressed, and a simple attack, DASH, is proposed to reveal the information inserted by the existing text-based data hiding methods. Then, a new data hiding method, UniSpaCh, is proposed to counter DASH. The characteristics of Unicode space characters with respect to embedding efficiency and DASH are analyzed, and the selected Unicode space characters are inserted into inter-sentence, inter-word, end-of-line and inter-paragraph spacings to encode external information while improving embedding efficiency and imperceptivity of the embedded information. UniSpaCh is also reversible where the embedded information can be removed to completely reconstruct the original Microsoft Word document. Experiments were carried out to verify the performance of UniSpaCh as well as comparing it to the existing space-manipulating data hiding methods. Results suggest that UniSpaCh offers higher embedding efficiency while exhibiting higher imperceptivity of white space manipulation when compared to the existing methods considered. In the best case scenario, UniSpaCh produces output document of size almost 9 times smaller than that of the existing method.

[1]  Xiamu Niu,et al.  Print-and-scan model and the watermarking countermeasure , 2005, Image Vis. Comput..

[2]  Wei Guo,et al.  A Robust Binary Text Digital Watermarking Algorithm for Print-Scan Process , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[3]  Brian Murphy,et al.  Syntactic Information Hiding in Plain Text , 2001 .

[4]  Sushil Jajodia,et al.  Exploring steganography: Seeing the unseen , 1998, Computer.

[5]  Carl Vogel,et al.  The syntax of concealment: reliable methods for plain text information hiding , 2007, Electronic Imaging.

[6]  Lip Yee Por,et al.  WhiteSteg: a new scheme in information hiding using text steganography , 2008 .

[7]  C. Orhan Orgun,et al.  Exploiting Linguistic Features in Lexical Steganography: Design and Proof-of-Concept Implementation , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[8]  Neil A. Ernst,et al.  The Journal of Systems and Software , 2022 .

[9]  Edward J. Delp,et al.  Attacks on lexical natural language steganography systems , 2006, Electronic Imaging.

[10]  Fabien A. P. Petitcolas,et al.  Revised Papers from the 5th International Workshop on Information Hiding , 2002 .

[11]  Stefan Katzenbeisser,et al.  Content-Aware Steganography: About Lazy Prisoners and Narrow-Minded Wardens , 2006, Information Hiding.

[12]  J. van Leeuwen,et al.  Information Hiding , 1999, Lecture Notes in Computer Science.

[13]  Robert E. Hegel,et al.  Romance of the Three Kingdoms , 2002, Pop! Lit for Kids (Set 2).

[14]  Edward J. Delp,et al.  Natural language watermarking , 2005, IS&T/SPIE Electronic Imaging.

[15]  Gary C. Kessler,et al.  An Overview of Steganography for the Computer Forensics Examiner , 2004 .

[16]  N. Chotikakamthorn,et al.  Electronic document data hiding technique using inter-character space , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[17]  Md. Khairullah A Novel Text Steganography System Using Font Color of the Invisible Characters in Microsoft Word Documents , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[18]  Radu Sion,et al.  Natural Language Watermarking and Tamperproofing , 2002, Information Hiding.

[19]  Helen Treharne,et al.  A New Multi-set Modulation Technique for Increasing Hiding Capacity of Binary Watermark for Print and Scan Processes , 2006, IWDW.

[20]  O. Roeva,et al.  Information Hiding: Techniques for Steganography and Digital Watermarking , 2000 .

[21]  M. Shirali-Shahreza,et al.  Text Steganography in SMS , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[22]  Hiram Calvo,et al.  Using Selectional Preferences for Extending a Synonymous Paraphrasing Method in Steganography , 2004 .

[23]  Mikhail J. Atallah,et al.  The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , 2006, MM&Sec '06.

[24]  Andreas Pfitzmann,et al.  Attacks on Steganographic Systems Breaking the Steganographic Utilities EzStego , Jsteg , Steganos , and STools — and Some Lessons Learned , 2000 .

[25]  Richard Bergmair,et al.  Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues , 2004 .

[26]  Walter Bender,et al.  Techniques for Data Hiding , 1996, IBM Syst. J..

[27]  Wen-Hsiang Tsai,et al.  A New Steganographic Method for Data Hiding in Microsoft Word Documents by a Change Tracking Technique , 2007, IEEE Transactions on Information Forensics and Security.

[28]  Adnan Abdul-Aziz Gutub,et al.  A Novel Arabic Text Steganography Method Using Letter Points and Extensions , 2007 .

[29]  Mikhail J. Atallah,et al.  Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation , 2001, Information Hiding.

[30]  Yun Q. Shi,et al.  Formatted text document data hiding robust to printing, copying and scanning , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[31]  Min Wu,et al.  Multimedia Data Hiding , 2003, Springer New York.

[32]  Wang Shuozhong,et al.  Information Hiding in Text Using Typesetting Tools with Stego-Encoding , 2006, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06).

[33]  Yun Q. Shi,et al.  Reversible Data Hiding for JPEG Images Based on Histogram Pairs , 2007, ICIAR.

[34]  Stefan Katzenbeisser,et al.  Towards Human Interactive Proofs in the Text-Domain (Using the Problem of Sense-Ambiguity for Security) , 2004, ISC.