Incremental One-Class Bagging for Streaming and Evolving Big Data

Modern machine learning systems need to be able to efficiently process big data. Extracting useful patterns from massive collection of objects requires not only accurate, but also fast algorithms with limited computational complexity. However, one should remember that the problem with massive datasets lies not only in their volume. There is a number of difficulties embedded in the nature of data, that must be properly addressed in order to design an efficient learning system. In this paper we address multiple problems related to big data analytics. We assume the streaming nature of our data. Additionally, we work in non-stationary environment where nature of data may constantly change. Finally, we consider a situation where not object from one class are only available what leads us to the one-class classification task. We propose a novel incremental ensemble of weighted one-class classifiers, based on boosting. Our learners adapt to evolving nature of data stream by changing weights assigned to objects and forgetting outdated examples. The proposed bagging scheme allows for diversifying the pool of individual classifiers which can run in a distributed computing environment. We propose to maintain the diversity of the ensemble by updating each classifier with a bootstrap sample from incoming stream. Experimental study proves the usefulness of our approach in scenarios, where we need to process massive and evolving data streams without the access to counterexamples.

[1]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[2]  Mário A. T. Figueiredo,et al.  Soft clustering using weighted one-class support vector machines , 2009, Pattern Recognit..

[3]  Geoff Hulten,et al.  A General Framework for Mining Massive Data Streams , 2003 .

[4]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[5]  Bartosz Krawczyk,et al.  Incremental weighted one-class classifier for mining stationary data streams , 2015, J. Comput. Sci..

[6]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[7]  M. M. Moya,et al.  Cueing, feature discovery, and one-class learning for synthetic aperture radar automatic target recognition , 1995, Neural Networks.

[8]  Michal Wozniak,et al.  Concept Drift Detection and Model Selection with Simulated Recurrence and Ensembles of Statistical Detectors , 2013, J. Univers. Comput. Sci..

[9]  Boguslaw Cyganek Image Segmentation with a Hybrid Ensemble of One-Class Support Vector Machines , 2010, HAIS.

[10]  Shaoning Pang,et al.  Incremental linear discriminant analysis for classification of data streams , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Nirvana Meratnia,et al.  Adaptive and Online One-Class Support Vector Machine-Based Outlier Detection Techniques for Wireless Sensor Networks , 2009, 2009 International Conference on Advanced Information Networking and Applications Workshops.

[12]  Konrad Jackowski,et al.  Fixed-size ensemble classifier system evolutionarily adapted to a recurring context with an unlimited pool of classifiers , 2013, Pattern Analysis and Applications.

[13]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[14]  Piotr Jedrzejowicz,et al.  Ensemble Online Classifier Based on the One-Class Base Classifiers for Mining Data Streams , 2015, Cybern. Syst..

[15]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[16]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[17]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[18]  Ke Zhang,et al.  Four Changes of Modern Universities From the Perspective of “4V” of Big Data , 2015 .

[19]  Bartosz Krawczyk,et al.  One-class classifiers with incremental learning and forgetting for data streams with concept drift , 2015, Soft Comput..

[20]  Xue Li,et al.  OcVFDT: one-class very fast decision tree for one-class classification of data streams , 2009, SensorKDD '09.

[21]  Philip S. Yu,et al.  Uncertain One-Class Learning and Concept Summarization Learning on Uncertain Data Streams , 2014, IEEE Transactions on Knowledge and Data Engineering.