A multi-one-class dynamic classifier for adaptive digitization of document streams

In this paper, we present a new dynamic classifier design based on a set of one-class independent SVM for image data stream categorization. Dynamic or continuous learning and classification has been recently investigated to deal with different situations, like online learning of fixed concepts, learning in non-stationary environments (concept drift) or learning from imbalanced data. Most of solutions are not able to deal at the same time with many of these specificities. Particularly, adding new concepts, merging or splitting concepts are most of the time considered as less important and are consequently less studied, whereas they present a high interest for stream-based document image classification. To deal with that kind of data, we explore a learning and classification scheme based on one-class SVM classifiers that we call mOC-iSVM (multi-one-class incremental SVM). Even if one-class classifiers are suffering from a lack of discriminative power, they have, as a counterpart, a lot of interesting properties coming from their independent modeling. The experiments presented in the paper show the theoretical feasibility on different benchmarks considering addition of new classes. Experiments also demonstrate that the mOC-iSVM model can be efficiently used for tasks dedicated to documents classification (by image quality and image content) in a context of streams, handling many typical scenarii for concepts extension, drift, split and merge.

[1]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[2]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[3]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[4]  Huan Liu,et al.  Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[5]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[6]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[7]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[8]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[9]  Masayuki Numao,et al.  Geometric method for document understanding and classification using online machine learning , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[10]  Thomas S. Huang,et al.  One-class SVM for learning in image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[11]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[12]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[13]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[14]  Zhi-Hua Zhou,et al.  Hybrid decision tree , 2002, Knowl. Based Syst..

[15]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[16]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[17]  Hanghang Tong,et al.  Blur detection for digital images using wavelet transform , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[18]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[19]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[20]  Zeki Erdem,et al.  Ensemble of SVMs for Incremental Learning , 2005, Multiple Classifier Systems.

[21]  Marimuthu Palaniswami,et al.  Incremental training of support vector machines , 2005, IEEE Transactions on Neural Networks.

[22]  Abdellatif Ennaji,et al.  A new learning algorithm for incremental self-organizing maps , 2005, ESANN.

[23]  Klaus-Robert Müller,et al.  Incremental Support Vector Learning: Analysis, Implementation and Applications , 2006, J. Mach. Learn. Res..

[24]  Dorothea Blostein,et al.  A survey of document image classification: problem statement, classifier architecture and performance evaluation , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[25]  Jonathan Lee,et al.  A new ARTMAP-based neural network for incremental learning , 2006, Neurocomputing.

[26]  Xin Yao,et al.  Negative correlation in incremental learning , 2009, Natural Computing.

[27]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[28]  Cesare Alippi,et al.  Just-in-time Adaptive Classifiers in Non-Stationary Conditions , 2007, 2007 International Joint Conference on Neural Networks.

[29]  Bidyut Baran Chaudhuri,et al.  An End-to-End Administrative Document Analysis System , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[30]  Edwin Lughofer,et al.  FLEXFIS: A Robust Incremental Learning Approach for Evolving Takagi–Sugeno Fuzzy Models , 2008, IEEE Transactions on Fuzzy Systems.

[31]  Robert Sabourin,et al.  Supervised Incremental Learning with the Fuzzy ARTMAP Neural Network , 2008, ANNPR.

[32]  Ping Chen,et al.  Hierarchical Text Classification Incremental Learning , 2009, ICONIP.

[33]  Éric Anquetil,et al.  Fast Incremental Learning Strategy Driven by Confusion Reject for Online Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[34]  Robi Polikar,et al.  Incremental learning in nonstationary environments with controlled forgetting , 2009, 2009 International Joint Conference on Neural Networks.

[35]  Cesare Alippi,et al.  Just in time classifiers: Managing the slow drift case , 2009, 2009 International Joint Conference on Neural Networks.

[36]  Robi Polikar,et al.  Learn$^{++}$ .NC: Combining Ensemble of Classifiers With Dynamically Weighted Consult-and-Vote for Efficient Incremental Learning of New Classes , 2009, IEEE Transactions on Neural Networks.

[37]  Philip S. Yu,et al.  Mining Concept-Drifting Data Streams , 2010, Data Mining and Knowledge Discovery Handbook.

[38]  Robert Sabourin,et al.  Adaptive Incremental Learning with an Ensemble of Support Vector Machines , 2010, 2010 20th International Conference on Pattern Recognition.

[39]  Ichiro Takeuchi,et al.  Multiple Incremental Decremental Learning of Support Vector Machines , 2009, IEEE Transactions on Neural Networks.

[40]  Mohamed Cheriet,et al.  Evolving Fuzzy Classifiers: Application to Incremental Learning of Handwritten Gesture Recognition Systems , 2010, 2010 20th International Conference on Pattern Recognition.

[41]  Shen Furao,et al.  Self-Organizing Incremental Neural Network and Its Application , 2010, ICANN.

[42]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[43]  Gisele L. Pappa,et al.  Temporally-aware algorithms for document classification , 2010, SIGIR '10.

[44]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[45]  Bruno Vallet,et al.  MOTION BLUR DETECTION IN AERIAL IMAGES SHOT WITH CHANNEL-DEPENDENT EXPOSURE TIME , 2010 .

[46]  Albert Bifet,et al.  Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams , 2010, Frontiers in Artificial Intelligence and Applications.

[47]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[48]  Terence Sim,et al.  Defocus map estimation from a single image , 2011, Pattern Recognit..

[49]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[50]  João Ricardo Sato,et al.  Measuring Abnormal Brains: Building Normative Rules in Neuroimaging Using One-Class Support Vector Machines , 2012, Front. Neurosci..

[51]  Jean-Yves Ramel,et al.  Document Classification in a Non-stationary Environment: A One-Class SVM Approach , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[52]  Manuel Bouillon,et al.  Decremental Learning of Evolving Fuzzy Inference Systems: Application to Handwritten Gesture Recognition , 2013, MLDM.

[53]  Yolande Belaïd,et al.  Document image and zone classification through incremental learning , 2013, 2013 IEEE International Conference on Image Processing.

[54]  Matthieu Guillaumin,et al.  Incremental Learning of NCM Forests for Large-Scale Image Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Jean-Yves Ramel,et al.  Multi One-Class Incremental SVM for both stationary and non-stationary environment , 2014 .

[56]  Jean-Philippe Domenger,et al.  Mesure de la netteté sur une image seule dans des documents anciens , 2014, CORIA-CIFED.

[57]  Frédéric Kaplan,et al.  Venice Time Machine : Recreating the density of the past , 2015 .

[58]  Madan Mohan Malaviya,et al.  Survey Paper on Document Classification and Classifiers , 2015 .

[59]  Dirk Helbing,et al.  Thinking Ahead - Essays on Big Data, Digital Revolution, and Participatory Market Society , 2015, Springer International Publishing.

[60]  Jean-Yves Ramel,et al.  Multi One-Class Incremental SVM for Document Stream Digitization , 2016 .