Document Authentication Using Printing Technique Features and Unsupervised Anomaly Detection

Automatically identifying that a certain page in a set of documents is printed with a different printer than the rest of the documents can give an important clue for a possible forgery attempt. Different printers vary in their produced printing quality, which is especially noticeable at the edges of printed characters. In this paper, a system using the difference in edge roughness to distinguish laser printed ages from inkjet printed pages is presented. Several feature extraction methods have been developed and evaluated for that purpose. In contrast to previous work, this system uses unsupervised anomaly detection to detect documents printed by a different printing technique than the majority of the documents among a set. This approach has the advantage that no prior training using genuine documents has to be done. Furthermore, we created a dataset featuring 1200 document images from different domains (invoices, contracts, scientific papers) printed by 7 different inkjet and 13 laser printers. Results show that the presented feature extraction method achieves the best outlier rank score in comparison to state-of-the-art features.

[1]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[2]  Thomas M. Breuel,et al.  Using DCT Features for Printing Technique and Copy Detection , 2009, IFIP Int. Conf. Digital Forensics.

[3]  Thomas M. Breuel,et al.  Automatic authentication of color laser print-outs using machine identification codes , 2012, Pattern Analysis and Applications.

[4]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[5]  Thomas M. Breuel,et al.  Evaluation of Graylevel-Features for Printing Technique Classification in High-Throughput Document Management Systems , 2008, IWCF.

[6]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[7]  Lin Mei,et al.  Printing Technique Classification for Document Counterfeit Detection , 2006, 2006 International Conference on Computational Intelligence and Security.

[8]  Thomas M. Breuel,et al.  Text-line examination for document forgery detection , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[9]  Jan P. Allebach,et al.  Printer identification based on graylevel co-occurrence features for security and forensic applications , 2005, IS&T/SPIE Electronic Imaging.

[10]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[11]  Edward J. Delp,et al.  Watermark embedding: hiding a signal within a cover image , 2001, IEEE Commun. Mag..

[12]  Thomas M. Breuel,et al.  Combined orientation and skew detection using geometric text-line modeling , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[13]  M. Amer,et al.  Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner , 2012 .

[14]  Thomas M. Breuel,et al.  Document cleanup using page frame detection , 2008, International Journal of Document Analysis and Recognition (IJDAR).