Visual appearance based document classification methods: Performance evaluation and benchmarking

Most of the traditional document image classification techniques concentrate on document segmentation and OCR analysis, in spite of so many complexities and limitations involved. Recently, many of the document image classification problems are easily solved just by adapting standard computer vision approaches for natural image retrieval and classification, that are referred as visual appearance based document classification techniques. These approaches have reported better results as compared to the traditional approaches on proprietary datasets. However, so far these approaches are not compared with each other and, despite having potential, they are not evaluated on distorted camera-captured documents, which is one of the challenging requirements in our present commercial document analysis projects. In this paper, we present simple and effective descriptions of different visual appearance based document image classification techniques. We compare their performance on various standard and publicly available datasets, that are differ in degree of image degradations and content variations. We also demonstrate their advantages and limitations. Additionally, we make the implemented versions of these method publicly available to research community for usage and further testing on other domains.

[1]  Jean-Philippe Domenger,et al.  Improving Classification of an Industrial Document Image Database by Combining Visual and Textual Features , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[2]  Dorothea Blostein,et al.  A survey of document image classification: problem statement, classifier architecture and performance evaluation , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[3]  David S. Doermann,et al.  Unsupervised Classification of Structurally Similar Document Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Gerald Schaefer,et al.  Visual appearance based document image classification , 2010, 2010 IEEE International Conference on Image Processing.

[5]  Josep Lladós,et al.  Logo Spotting by a Bag-of-words Approach for Document Categorization , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[6]  Josep Lladós,et al.  Multipage document retrieval by textual and visual representations , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[7]  Prateek Sarkar Image classification: Classifying distributions of visual features , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[8]  Syed Saqib Bukhari,et al.  Business Forms Classification Using Earth Mover's Distance , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[9]  Shlomo Argamon,et al.  Building a test collection for complex document information processing , 2006, SIGIR.

[10]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[11]  Venkata Gopal Edupuganti,et al.  Registration of camera captured documents under non-rigid deformation , 2011, CVPR 2011.

[12]  Michael D. Garris,et al.  NIST Special Database 2 - Structured Forms Database Users' Guide , 2017 .

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Syed Saqib Bukhari,et al.  An Image Based Performance Evaluation Method for Page Dewarping Algorithms Using SIFT Features , 2011, CBDAR.

[16]  Ioannis Pratikakis,et al.  A Methodology for Document Image Dewarping Techniques Performance Evaluation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[17]  C. V. Jawahar,et al.  Word Image Retrieval Using Bag of Visual Words , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[18]  David S. Doermann,et al.  A Dataset for Quality Assessment of Camera Captured Document Images , 2013, CBDAR.