A New One-Class Classification Method Based on Symbolic Representation: Application to Document Classification

Training a system using a small number of instances to obtain accurate recognition/classification is a crucial need in document classification domain. The one-class classification is chosen since only positive samples are available for the training. In this paper, a new one-class classification method based on symbolic representation method is proposed. Initially a set of features is extracted from the training set. A set of intervals valued symbolic feature vector is then used to represent the class. Each interval value (symbolic data) is computed using mean and standard deviation of the corresponding feature values. To evaluate the proposed one-class classification method a dataset composed of 544 document images was used. Experiment results reveal that the proposed one-class classification method works well even when the number of training samples is small (≤10). Moreover, we noted that the proposed one-class classification method is suitable for document classification and provides better result compared to one-class k-nearest neighbor (k-NN) classifier.

[1]  Robert P. W. Duin,et al.  Uniform Object Generation for Optimizing One-class Classifiers , 2002, J. Mach. Learn. Res..

[2]  Shehroz S. Khan,et al.  A Survey of Recent Trends in One Class Classification , 2009, AICS.

[3]  Andreas Girgensohn,et al.  Genre identification for office document search and browsing , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[4]  A. Bartkowiak Anomaly, Novelty, One-Class Classification: A Comprehensive Introduction , 2011 .

[5]  Robert P. W. Duin,et al.  Data domain description using support vectors , 1999, ESANN.

[6]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[7]  E. Diday An Introduction to symbolic data analysis , 1993 .

[8]  H. N. Prakash,et al.  Offline Signature Verification: An Approach Based on Score Level Fusion , 2010 .

[9]  Oleksiy Mazhelis,et al.  One-class classifiers : a review and analysis of suitability in the context of mobile-masquerader detection , 2006, South Afr. Comput. J..

[10]  T N Vikram,et al.  Symbolic representation of Kannada characters for recognition , 2008, 2008 IEEE International Conference on Networking, Sensing and Control.

[11]  T. Subba Rao,et al.  Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB , 2004 .

[12]  David M. J. Tax,et al.  One-class classification , 2001 .

[13]  Joachim Denzler,et al.  One-class classification with Gaussian processes , 2013, Pattern Recognit..