Comparative study between incremental and ensemble learning on data streams: Case study

With unlimited growth of real-world data size and increasing requirement of real-time processing, immediate processing of big stream data has become an urgent problem. In stream data, hidden patterns commonly evolve over time (i.e.,concept drift), where many dynamic learning strategies have been proposed, such as the incremental learning and ensemble learning. To the best of our knowledge, there is no work systematically compare these two methods. In this paper we conduct comparative study between theses two learning methods. We first introduce the concept of “concept drift”, and propose how to quantitatively measure it. Then, we recall the history of incremental learning and ensemble learning, introducing milestones of their developments. In experiments, we comprehensively compare and analyze their performances w.r.t. accuracy and time efficiency, under various concept drift scenarios. We conclude with several future possible research problems.

[1]  Yong Shi,et al.  Categorizing and mining concept drifting data streams , 2008, KDD.

[2]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[3]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[4]  Xindong Wu,et al.  Ensemble pruning via individual contribution ordering , 2010, KDD.

[5]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[6]  Zhang Jing,et al.  A Simplified Learning Algorithm of Incremental Bayesian , 2009, CSIE.

[7]  Rong Xiao,et al.  An Approach to Incremental SVM Learning Algorithm , 2000, 2008 ISECS International Colloquium on Computing, Communication, Control, and Management.

[8]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[9]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[11]  Li Deng,et al.  Incremental Bayes learning with prior evolution for tracking nonstationary noise statistics from noisy speech data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Philip S. Yu,et al.  Pruning and dynamic scheduling of cost-sensitive ensembles , 2002, AAAI/IAAI.

[13]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[14]  Li Guo,et al.  Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams , 2010, 2010 IEEE International Conference on Data Mining.

[15]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[16]  Christophe G. Giraud-Carrier,et al.  A Note on the Utility of Incremental Learning , 2000, AI Commun..

[17]  Huan Liu,et al.  Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[18]  Qiang-Li Zhao,et al.  A fast ensemble pruning algorithm based on pattern mining process , 2009, Data Mining and Knowledge Discovery.

[19]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[20]  Jun Zheng,et al.  An Online Incremental Learning Support Vector Machine for Large-scale Data , 2010, ICANN.

[21]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[22]  Josep Roure Alcobé Incremental Augmented Naive Bayes Classifiers , 2004, ECAI.

[23]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[24]  Glenn Fung,et al.  Incremental Support Vector Machine Classification , 2002, SDM.

[25]  Yongdai Kim,et al.  Model Averaging via Penalized Regression for Tracking Concept Drift , 2010 .

[26]  Ming-Syan Chen,et al.  Incremental SVM Model for Spam Detection on Dynamic Email Social Networks , 2009, 2009 International Conference on Computational Science and Engineering.

[27]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[28]  Klaus-Robert Müller,et al.  Incremental Support Vector Learning: Analysis, Implementation and Applications , 2006, J. Mach. Learn. Res..

[29]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.