Traitor-Proof PDF Watermarking

This paper presents a traitor-tracing technique based on the watermarking of digital documents (pdf files in particular). The watermarking algorithm uses a chain of three separate techniques that work in synergy. The embedded payload can withstand a wide range of attacks and cannot be removed without invalidating the credibility of the document. We will present an implementation of the approach and discuss its limitations with respect to documents that can be watermarked and quality of the watermarked documents. We will also analyse two payload alternatives and how the encryption scheme may alleviate the chilling effect on whistle-blowing.

[1]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[2]  Bülent Sankur,et al.  Syntactic tools for text watermarking , 2007, Electronic Imaging.

[3]  Stavros D. Nikolopoulos,et al.  Encoding watermark integers as self-inverting permutations , 2010, CompSysTech '10.

[4]  Roberto Navigli,et al.  Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information , 2020, ACL.

[5]  Tianbo Xue,et al.  Concatenated Synchronization Error Correcting Code with Designed Markers , 2019, 2019 IEEE 5th International Conference on Computer and Communications (ICCC).

[6]  Jack Brassil Hiding Information in Document Images , 2007 .

[7]  Jessica J. Fridrich,et al.  Writing on wet paper , 2005, IEEE Transactions on Signal Processing.

[8]  Lawrence O'Gorman,et al.  Electronic marking and identification techniques to discourage document copying , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.

[9]  Radu Sion,et al.  Natural Language Watermarking and Tamperproofing , 2002, Information Hiding.

[10]  Chanathip Namprempre,et al.  Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm , 2000, Journal of Cryptology.

[11]  Xingming Sun,et al.  Print-Scan Resilient Text Image Watermarking Based on Stroke Direction Modulation for Chinese Document Authentication , 2012 .

[12]  Kyung-Ae Moon,et al.  A text watermarking algorithm based on word classification and inter-word space statistics , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[13]  Martin Steinebach,et al.  A digital watermark for vector-based fonts , 2006, MM&Sec '06.

[14]  Hong Yan,et al.  Interword distance changes represented by sine waves for watermarking text images , 2001, IEEE Trans. Circuits Syst. Video Technol..

[15]  Bülent Sankur,et al.  Natural language watermarking via morphosyntactic alterations , 2009, Comput. Speech Lang..

[16]  Yun Tian,et al.  Comparison of current semantic similarity methods in WordNet , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[19]  Mikhail J. Atallah,et al.  Information hiding through errors: a confusing approach , 2007, Electronic Imaging.

[20]  Mikhail J. Atallah,et al.  The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , 2006, MM&Sec '06.

[21]  Lamiaa A. Elrefaei,et al.  Arabic Text Watermarking: A Review , 2015, ArXiv.

[22]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.