A method for automatic adjustment of ensemble size in stream data mining

In recent years plenty of new algorithms for data stream classification were developed. The occurrence of different concept drift types in data streams turned out to be especially challenging. Much attention was paid to the ensemble methods because of their desired properties. However, the problem of deciding how many components should be stored in the ensemble is still an open issue. Therefore in this article we show a theoretically justified method of determining the proper ensemble size automatically. The performance of the proposed algorithm was experimentally tested and compared with other known methods.

[1]  Jerzy Stefanowski,et al.  Accuracy Updated Ensemble for Data Streams with Concept Drift , 2011, HAIS.

[2]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the Gaussian Approximation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  Piotr Duda,et al.  A New Method for Data Stream Mining Based on the Misclassification Error , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Kaizhu Huang,et al.  DE2: Dynamic ensemble of ensembles for learning nonstationary data , 2015, Neurocomputing.

[10]  Hao Wang,et al.  Learning concept-drifting data streams with random ensemble decision trees , 2015, Neurocomputing.

[11]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[12]  Hadi Sadoghi Yazdi,et al.  Ensemble of online neural networks for non-stationary and imbalanced data streams , 2013, Neurocomputing.

[13]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the McDiarmid's Bound , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  M. D. Ingle,et al.  SVM based Feature Extraction for Novel Class Detection from Streaming Data , 2015 .

[15]  R. Durrett Probability: Theory and Examples , 1993 .

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  Juan José Rodríguez Diez,et al.  Diversity techniques improve the performance of the best imbalance learning ensembles , 2015, Inf. Sci..

[18]  Piotr Duda,et al.  The CART decision tree for mining data streams , 2014, Inf. Sci..

[19]  Xiaoou Li,et al.  Data Stream Classification for Structural Health Monitoring via On-Line Support Vector Machines , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[20]  Li Zhang,et al.  An adaptive ensemble classifier for mining concept drifting data streams , 2013, Expert Syst. Appl..

[21]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[22]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[23]  Robi Polikar,et al.  An Ensemble Approach for Incremental Learning in Nonstationary Environments , 2007, MCS.

[24]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[25]  Jerzy Stefanowski,et al.  Combining block-based and online methods in learning ensembles from concept drifting data streams , 2014, Inf. Sci..

[26]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[27]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[28]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[29]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[30]  Geoff Holmes,et al.  New Options for Hoeffding Trees , 2007, Australian Conference on Artificial Intelligence.

[31]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[32]  Frank Klawonn,et al.  Evolving Extended Naive Bayes Classifiers , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[33]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.