Document Classification in a Non-stationary Environment: A One-Class SVM Approach

In this paper, we investigate a specific area of document classification in which the documents come as a flow over the time. Moreover, the exact number of classes of document to deal with is not known from the beginning and could evolve over the time. To be able to perform classification task in such area, we need specific classifiers that are able to perform incremental learning and change their modeling over the time. More specifically, we are focusing our study on SVM approaches, known to perform well, and for which incremental (i-SVM) procedures exist. Nevertheless, most of them are only able to deal with a fixed number of classes. So we designed a new incremental learning procedure based on one-class SVMs. This one is able to improve its classification accuracy over the time, with the arrival of new labeled data, without performing any complete retraining. Moreover, when instances are coming with a previously unknown label (appearance of a new class), the training procedure is able to modify the classifier model to recognize this corresponding new kind of documents. To investigate this area, waiting for collecting documents images as a flow, we did first experiments on the Optical Recognition of Handwritten Digits Data Set. These experiments show that our incremental approach is able: to perform, at each time, as well as a static one-class classifier fully retrained using all previously seen data, to model very quickly and efficiently new incoming classes.

[1]  Robert Sabourin,et al.  Adaptive Incremental Learning with an Ensemble of Support Vector Machines , 2010, 2010 20th International Conference on Pattern Recognition.

[2]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[3]  Liva Ralaivola,et al.  Incremental Support Vector Machine Learning: A Local Approach , 2001, ICANN.

[4]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Huan Liu,et al.  Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[7]  Thomas S. Huang,et al.  One-class SVM for learning in image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[8]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[9]  Marimuthu Palaniswami,et al.  Incremental training of support vector machines , 2005, IEEE Transactions on Neural Networks.

[10]  Robi Polikar,et al.  Learn$^{++}$ .NC: Combining Ensemble of Classifiers With Dynamically Weighted Consult-and-Vote for Efficient Incremental Learning of New Classes , 2009, IEEE Transactions on Neural Networks.

[11]  Piyabute Fuangkhon,et al.  An incremental learning algorithm for supervised neural network with contour preserving classification , 2009, 2009 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[14]  Zeki Erdem,et al.  Ensemble of SVMs for Incremental Learning , 2005, Multiple Classifier Systems.

[15]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[16]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[17]  Klaus-Robert Müller,et al.  Incremental Support Vector Learning: Analysis, Implementation and Applications , 2006, J. Mach. Learn. Res..

[18]  Ichiro Takeuchi,et al.  Multiple Incremental Decremental Learning of Support Vector Machines , 2009, IEEE Transactions on Neural Networks.