Online Active Learning Paired Ensemble for Concept Drift and Class Imbalance

Practical applications often require learning algorithms capable of addressing data streams with concept drift and class imbalance. This paper proposes an online active learning paired ensemble for drifting streams with class imbalance. The paired ensemble consists of a long-term stable classifier and a dynamic classifier to address both sudden concept drift and gradual concept drift. To select the most representative instances for learning, a hybrid labeling strategy which includes an uncertainty strategy and an imbalance strategy is proposed. The uncertainty strategy applies a margin-based uncertainty criterion and a dynamic adjustment threshold. Based on the categorical distribution of the last data block, the imbalance strategy prefers to learn instances of the minority category. In addition, it also incorporates the advantages of the traditional random strategy and helps to capture the drifts away from the decision boundary. Experiments on real datasets and synthetic datasets utilize prequential AUC as an evaluation index, comparing the classification performance of our method with semi-supervised and supervised learning methods. The results show that the proposed methods can obtain higher AUC values at an even lower labeling cost. Moreover, it is noteworthy that the labeling cost can be dynamically allocated according to the concept drift and imbalance ratio.

[1]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[3]  Xin Yao,et al.  Resampling-Based Ensemble Methods for Online Class Imbalance Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[4]  Bartosz Krawczyk,et al.  Active and adaptive ensemble learning for online activity recognition from data streams , 2017, Knowl. Based Syst..

[5]  Jingbo Zhu,et al.  Active Learning With Sampling by Uncertainty and Density for Data Annotations , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Vicenç Puig,et al.  Fault Diagnosis using a Timed Discrete Event Approach based on Interval Observers , 2008 .

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[9]  Geoff Holmes,et al.  Active Learning with Evolving Streaming Data , 2011, ECML/PKDD.

[10]  Xin Yao,et al.  A Systematic Study of Online Class Imbalance Learning With Concept Drift , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Peter Tiño,et al.  Concept drift detection for online class imbalance learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[12]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[13]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[14]  Jerzy Stefanowski,et al.  Prequential AUC for Classifier Evaluation and Drift Detection in Evolving Data Streams , 2014, NFMCP.

[15]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Xiaodong Lin,et al.  Active Learning From Stream Data Using Optimal Weight Classifier Ensemble , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[18]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[19]  Khaled Ghédira,et al.  Discussion and review on evolving data streams and concept drift adapting , 2018, Evol. Syst..

[20]  Heng Wang,et al.  Concept drift detection for streaming data , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[21]  Yolande Belaïd,et al.  An adaptive streaming active learning strategy based on instance weighting , 2016, Pattern Recognit. Lett..

[22]  João Gama,et al.  A new dynamic modeling framework for credit risk assessment , 2016, Expert Syst. Appl..

[23]  Foster J. Provost,et al.  Online active inference and learning , 2011, KDD.

[24]  Tianbao Yang,et al.  Online Asymmetric Active Learning with Imbalanced Data , 2016, KDD.

[25]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[26]  Bartosz Krawczyk,et al.  Online query by committee for active learning from drifting data streams , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[27]  Bartosz Krawczyk,et al.  Learning Classification Rules with Differential Evolution for High-Speed Data Stream Mining on GPU s , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[28]  Jerzy Stefanowski,et al.  Prequential AUC: properties of the area under the ROC curve for data streams with concept drift , 2017, Knowledge and Information Systems.

[29]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[30]  Dino Ienco,et al.  High density-focused uncertainty sampling for active learning over evolving stream data , 2014, BigMine.