Ensemble based data stream mining with recalling and forgetting mechanisms

Using ensemble of classifiers on sequential chunks of training instances is a popular strategy for data stream mining. Aiming at the limitations of the existing approaches, we introduce recalling and forgetting mechanisms into ensemble based data stream mining, and put forward a new algorithm MAE (Memorizing based Adaptive Ensemble) to mine chunk-based data streams with concept drifts. Ensemble pruning is used as a recalling mechanism to select useful component classifiers for each incoming data chunk. Ebbinghaus forgetting curve is adopted as a forgetting mechanism to evaluate and replace the component classifiers in the memory repository. Experiments have been performed on datasets with different types of concept drifts. Compared with traditional ensemble approaches, the results show that MAE is a good algorithm with high and stable accuracy, less predicting time and moderate training time.

[1]  Grigorios Tsoumakas,et al.  Pruning an ensemble of classifiers via reinforcement learning , 2009, Neurocomputing.

[2]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[3]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4]  Takashi Omori,et al.  ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments , 2005, Multiple Classifier Systems.

[5]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[6]  Qiang-Li Zhao,et al.  A fast ensemble pruning algorithm based on pattern mining process , 2009, Data Mining and Knowledge Discovery.

[7]  Zoran Obradovic,et al.  Effective pruning of neural network classifier ensembles , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[8]  Philip S. Yu,et al.  Mining Concept-Drifting Data Streams , 2010, Data Mining and Knowledge Discovery Handbook.

[9]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[10]  Hermann Ebbinghaus (1885) Memory: A Contribution to Experimental Psychology , 2013, Annals of Neurosciences.

[11]  Jerzy Stefanowski,et al.  Accuracy Updated Ensemble for Data Streams with Concept Drift , 2011, HAIS.

[12]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[13]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[14]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.