When Document Security Brings New Challenges to Document Analysis

It is very easy to ensure the authenticity of a digital document or of a paper document. However this security is seriously weakened when this document crosses the border between the material and the digital world. This paper presents the beginning of our work towards the creation of a document signature that would solve this security issue. Our primary finding is that current state of the art document analysis algorithms need to be re-evaluated under the criterion of robustness as we have done for OCR processing.

[1]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[2]  William Puech,et al.  Perceptual Image Hashing , 2012 .

[3]  François Cayre,et al.  2D bar-codes for authentication: A security approach , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[4]  Christoph Zauner,et al.  Implementation and Benchmarking of Perceptual Image Hash Functions , 2010 .

[5]  Martin Reynaert,et al.  Non-interactive OCR Post-correction for Giga-Scale Digitization Projects , 2008, CICLing.

[6]  Markus Schneider,et al.  F2S2: Fast forensic similarity search through indexing piecewise hash signatures , 2013, Digit. Investig..

[7]  Gonzalo Navarro,et al.  Improved compressed indexes for full-text document retrieval , 2013, J. Discrete Algorithms.

[8]  Quynh H. Dang,et al.  Secure Hash Standard | NIST , 2015 .

[9]  Thierry Pun,et al.  Tamper-proofing of electronic and printed text documents via robust hashing and data-hiding , 2007, Electronic Imaging.

[10]  Erik G. Learned-Miller,et al.  Improving state-of-the-art OCR through high-precision document-specific modeling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  William M. Daley,et al.  Digital Signature Standard (DSS) , 2000 .

[12]  Mithun Das Gupta Watermarking - Volume 2 , 2012 .

[13]  Thierry Pun,et al.  Security analysis of robust perceptual hashing , 2008, Electronic Imaging.

[14]  Martin Reynaert Character confusion versus focus word-based correction of spelling and OCR variants in corpora , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[15]  A. Haar Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[16]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[17]  Helena Handschuh,et al.  Security Analysis of SHA-256 and Sisters , 2003, Selected Areas in Cryptography.

[18]  Alfred Haar,et al.  On the Theory of Orthogonal Function Systems , 2009 .

[19]  Fachgebiet Wissensbasierte Unsupervised Post-Correction of OCR Errors , 2010 .

[20]  Xingming Sun,et al.  Perceptual Text Image Hashing Based on Shape Recognition , 2011 .