An adaptive system for automatic invoice-documents classification

The large amount of documents to be daily managed in modern offices requires development of automatic document classification tools aiming at (semi)automatically classifying the office documents into semantically similar classes. This paper presents an automatic invoice-documents classification system based on the analysis of the graphical information present in the document and able to perform both closed (the number of classes is fixed) and open world (the number of classes increases during operational life) classification. Invoice-documents of real companies prove that the classification system achieves a 99% correct classification in closed world and 79% in the open world case.

[1]  Axel Pinz,et al.  Layout and analysis: Finding text, titles, and photos in digital images of newspaper pages , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[2]  A. R. Rostampour,et al.  Shape recognition using simple measures of projections , 1988, Seventh Annual International Phoenix Conference on Computers an Communications. 1988 Conference Proceedings.

[3]  Wai Lam,et al.  Automatic document classification based on probabilistic reasoning: model and performance analysis , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[4]  Maguelonne Teisseire,et al.  Classification of documents by content , 2003, The Second IEEE International Conference on Cognitive Informatics, 2003. Proceedings..

[5]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[6]  J. Farkas Neural networks and document classification , 1993, Proceedings of Canadian Conference on Electrical and Computer Engineering.

[7]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[8]  Luigi Cinque,et al.  A system for the automatic layout segmentation and classification of digital documents , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[9]  Jianying Hu,et al.  Document image layout comparison and classification , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[10]  Andreas Dengel,et al.  Clustering and classification of document structure-a machine learning approach , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[11]  Jürgen Schürmann,et al.  Pattern classification , 2008 .

[12]  Frank Y. Shih,et al.  A document segmentation, classification and recognition system , 1992, Proceedings of the Second International Conference on Systems Integration.