A Compliant Document Image Classification System Based on One-Class Classifier

Document image classification in a professional context requires to respect some constraints such as dealing with a large variability of documents and/or number of classes. Whereas most methods deal with all classes at the same time, we answer this problem by presenting a new compliant system based on the specialization of the features and the parametrization of the classifier separately, class per class. We first compute a generalized vector of features based on global image characterization and structural primitives. Then, for each class, the feature vector is specialized by ranking the features according a stability score. Finally, a one-class K-nn classifier is trained using these specific features. Conducted experiments reveal good classification rates, proving the ability of our system to deal with a large range of documents classes.

[1]  Thomas S. Huang,et al.  One-class SVM for learning in image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[2]  Karin Wall,et al.  A fast sequential method for polygonal approximation of digitized curves , 1984, Comput. Vis. Graph. Image Process..

[3]  Yolande Belaïd,et al.  A Stream-Based Semi-supervised Active Learning Approach for Document Classification , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Siyuan Chen,et al.  Structured document classification by matching local salient features , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[5]  Vito Di Gesù,et al.  Combining One Class Fuzzy KNN's , 2007, WILF.

[6]  David S. Doermann,et al.  Learning document structure for retrieval and classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[7]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[8]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Dorothea Blostein,et al.  A survey of document image classification: problem statement, classifier architecture and performance evaluation , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[11]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[12]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Filter Feature Selection for One-Class Classification , 2014, Journal of Intelligent & Robotic Systems.

[13]  Azriel Rosenfeld,et al.  Classification of document pages using structure-based features , 2001, International Journal on Document Analysis and Recognition.

[14]  Vito Di Gesù,et al.  A one class KNN for signal identification: a biological case study , 2009, Int. J. Knowl. Eng. Soft Data Paradigms.

[15]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[16]  Éric Trupin,et al.  Classification method study for automatic form class identification , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[17]  Eric Saund Scientific challenges underlying production document processing , 2011, Electronic Imaging.

[18]  Gerald Schaefer,et al.  Visual appearance based document image classification , 2010, 2010 IEEE International Conference on Image Processing.

[19]  Francesca Cesarini,et al.  Encoding of modified X-Y trees for document classification , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.