A dataset for forgery detection and spotting in document images

In the last decades, the explosion of the volume of digital document images, and the development of consumer tools to modify these images, has lead to a huge increase on reported fraudulent document cases. This situation has promoted the development of automatic methods for both preventing forgeries in modified documents and detecting them. However, document forensics is a sensitive topic. Data is usually either private or unlabeled, and most of the reported works are commonly evaluated on datasets with a restricted access. In this paper we present a new public dataset made of a corpus of 477 corrupted payslips in which near 6000 characters were forged. Provided with a reliable groundtruth, we expect this dataset to be useful for many works in the digital forensics research domain.

[1]  Shan Ling Pan,et al.  Towards the Restoration of Public Trust in Electronic Governments: A Case Study of the E-Filing System in Singapore , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2]  France Bélanger,et al.  Trust and Risk in eGovernment Adoption , 2008, AMCIS.

[3]  Thomas M. Breuel,et al.  Document Signature Using Intrinsic Features for Counterfeit Detection , 2008, IWCF.

[4]  Ilya Mironov,et al.  Hash functions: Theory, attacks, and applications , 2005 .

[5]  Jan P. Allebach,et al.  Signature-embedding in printed documents for security and forensic applications , 2004, IS&T/SPIE Electronic Imaging.

[6]  Keiichi Abe,et al.  Topological structural analysis of digitized binary images by border following , 1985, Comput. Vis. Graph. Image Process..

[7]  Oriol Ramos Terrades,et al.  A Conditional Random Field model for font forgery detection , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[8]  François Cayre,et al.  2D bar-codes for authentication: A security approach , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[9]  Thomas M. Breuel,et al.  Automatic Line Orientation Measurement for Questioned Document Examination , 2009, IWCF.

[10]  Thomas M. Breuel,et al.  Document inspection using text-line alignment , 2010, DAS '10.

[11]  Oriol Ramos Terrades,et al.  A System Based on Intrinsic Features for Fraudulent Document Detection , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[12]  Marcus Liwicki,et al.  Proceedings of the 2nd International Workshop on Automated Forensic Handwriting Analysis: A Satellite Workshop of International Conference on Document Analysis and Recognition (ICDAR 2013), Washington DC, USA , 2013 .

[13]  Faisal Shafait,et al.  Distortion Measurement for Automatic Document Verification , 2011, 2011 International Conference on Document Analysis and Recognition.

[14]  Sébastien Eskenazi,et al.  When Document Security Brings New Challenges to Document Analysis , 2014, IWCF.