Open world classification of printed invoices

A key step in the understanding of printed documents is their classification based on the nature of information they contain and their layout. In this work we consider a dynamic scenario in which document classes are not known a priori and new classes can appear at any time. This open world setting is both realistic and highly challenging. We use an SVM-based classifier based only on image-level features and use a nearest-neighbor approach for detecting new classes. We assess our proposal on a real-world dataset composed of 562 invoices belonging to 68 different classes. These documents were digitalized after being handled by a corporate environment, thus they are quite noisy---e.g., big stamps and handwritten signatures at unfortunate positions and alike. The experimental results are highly promising.

[1]  Jianying Hu,et al.  Document classification using layout analysis , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Yolande Belaïd,et al.  Morphological Tagging Approach in Document Analysis of Invoices , 2004, ICPR.

[5]  David S. Doermann,et al.  Page classification through logical labelling , 2002, Object recognition supported by user interaction for service robots.

[6]  Bidyut Baran Chaudhuri,et al.  Incremental classification of invoice documents , 2008, 2008 19th International Conference on Pattern Recognition.

[7]  Jürgen Schürmann,et al.  Pattern classification , 1996 .

[8]  Francesca Cesarini,et al.  Analysis and understanding of multi-class invoices , 2003, Document Analysis and Recognition.

[9]  Cesare Alippi,et al.  An adaptive system for automatic invoice-documents classification , 2005, IEEE International Conference on Image Processing 2005.

[10]  Eric Medvet,et al.  Improving Features Extraction for Supervised Invoice Classification , 2010 .

[11]  J. Farkas Neural networks and document classification , 1993, Proceedings of Canadian Conference on Electrical and Computer Engineering.