Clustering document fragments using background color and texture information

Forensic analysis of questioned documents sometimes can be extensively data intensive. A forensic expert might need to analyze a heap of document fragments and in such cases to ensure reliability he/she should focus only on relevant evidences hidden in those document fragments. Relevant document retrieval needs finding of similar document fragments. One notion of obtaining such similar documents could be by using document fragment's physical characteristics like color, texture, etc. In this article we propose an automatic scheme to retrieve similar document fragments based on visual appearance of document paper and texture. Multispectral color characteristics using biologically inspired color differentiation techniques are implemented here. This is done by projecting document color characteristics to Lab color space. Gabor filter-based texture analysis is used to identify document texture. It is desired that document fragments from same source will have similar color and texture. For clustering similar document fragments of our test dataset we use a Self Organizing Map (SOM) of dimension 5×5, where the document color and texture information are used as features. We obtained an encouraging accuracy of 97.17% from 1063 test images.

[1]  Manuel Menezes de Oliveira Neto,et al.  Document Reconstruction Based on Feature Matching , 2005, XVIII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI'05).

[2]  Robert Sablatnig,et al.  Document analysis applied to fragments: feature set for the reconstruction of torn documents , 2010, DAS '10.

[3]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[4]  Luiz Eduardo Soares de Oliveira,et al.  Document reconstruction using dynamic programming , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Jorge Stolfi,et al.  A Multiscale Method for the Reassembly of Two-Dimensional Fragmented Objects , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Aytül Erçil,et al.  A Texture Based Matching Approach for Automated Assembly of Puzzles , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[7]  Patrick de Smet Semi-automatic Forensic Reconstruction of Ripped-up Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[8]  Giovanni Ramponi,et al.  System architecture for the digital recovery of shredded documents , 2005, IS&T/SPIE Electronic Imaging.

[9]  Teuvo Kohonen Self-organizing maps of massive document collections , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[10]  Robert Sablatnig,et al.  A Survey of Techniques for Document and Archaeology Artefact Reconstruction , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  Edson Justino,et al.  Reconstructing shredded documents through feature matching. , 2006, Forensic science international.

[12]  Partha Bhowmick,et al.  Reconstruction of torn documents using contour maps , 2005, IEEE International Conference on Image Processing 2005.

[13]  Dewen Hu,et al.  Globally Consistent Reconstruction of Ripped-Up Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.