Active Learning with Abstaining Classifiers for Imbalanced Drifting Data Streams

Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Prohibitive time and memory constraints or non-stationary distributions are only some of the problems. When dealing with classification tasks, one has to remember that effective adaptation has to be achieved on weak foundations of partially labeled and often imbalanced data. In our work, we propose an online framework for binary classification, that aims to handle the complex problem of working with dynamic, sparsely labeled and imbalanced streams. The main part of it is a novel active learning strategy (MD-OAL) that is able to prioritize labeling of minority instances and, as a result, improve the balance of the learning process. We combine the strategy with a dynamic ensemble of base learners that can abstain from making decisions, if they are very uncertain. We adjust the abstaining mechanism in favor of minority instances, providing an effective method for handling remaining imbalance and a concept drift simultaneously. The conducted evaluation shows that in the challenging and realistic scenarios our framework outperforms state-of-the-art algorithms, providing higher resilience to the combined effect of limited labeling and imbalance.

[1]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[2]  Bartosz Krawczyk,et al.  Cost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams , 2017, ECML/PKDD.

[3]  Rayid Ghani,et al.  Online Active Learning with Imbalanced Classes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[4]  Xin Yao,et al.  Dealing with Multiple Classes in Online Class Imbalance Learning , 2016, IJCAI.

[5]  Yolande Belaïd,et al.  An adaptive streaming active learning strategy based on instance weighting , 2016, Pattern Recognit. Lett..

[6]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[7]  Tianbao Yang,et al.  Online Asymmetric Active Learning with Imbalanced Data , 2016, KDD.

[8]  Qingyao Wu,et al.  Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data , 2018, KDD.

[9]  Changyin Sun,et al.  Active Learning From Imbalanced Data: A Solution of Online Weighted Extreme Learning Machine , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Xin Yao,et al.  A Systematic Study of Online Class Imbalance Learning With Concept Drift , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Michal Wozniak,et al.  Active Learning Classification of Drifted Streaming Data , 2016, ICCS.

[12]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[13]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[14]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[15]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[16]  Mehmed M. Kantardzic,et al.  A partial labeling framework for multi-class imbalanced streaming data , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[17]  Witold Pedrycz,et al.  Cost-Sensitive Weighting and Imbalance-Reversed Bagging for Streaming Imbalanced and Concept Drifting in Electricity Pricing Classification , 2019, IEEE Transactions on Industrial Informatics.

[18]  Bartosz Krawczyk,et al.  Online query by committee for active learning from drifting data streams , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[19]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[20]  Roberto Souto Maior de Barros,et al.  RCD: A recurring concept drift framework , 2013, Pattern Recognit. Lett..

[21]  Cheong Hee Park,et al.  An active learning method for data streams with concept drift , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[22]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[23]  Bartosz Krawczyk,et al.  Combining Active Learning and Self-Labeling for Data Stream Mining , 2017, CORES.

[24]  Yuan Yan Tang,et al.  Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift , 2017, IJCAI.

[25]  David Haussler,et al.  Proceedings of the fifth annual workshop on Computational learning theory , 1992, COLT 1992.

[26]  Min Chen,et al.  Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning , 2019, Knowl. Based Syst..

[27]  João Gama,et al.  A survey on learning from data streams: current and future trends , 2012, Progress in Artificial Intelligence.

[28]  Hisashi Kashima,et al.  Budgeted stream-based active learning via adaptive submodular maximization , 2016, NIPS.

[29]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[30]  Yi Yang,et al.  A Framework of Online Learning with Imbalanced Streaming Data , 2017, AAAI.

[31]  Hang Zhang,et al.  Online Active Learning Ensemble Framework for Drifted Data Streams , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Jerzy Stefanowski,et al.  Abstaining in rule set bagging for imbalanced data , 2015, Log. J. IGPL.

[33]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[34]  Olawande Daramola,et al.  Big data stream analysis: a systematic literature review , 2019, Journal of Big Data.

[35]  Bartosz Krawczyk,et al.  Online ensemble learning with abstaining classifiers for drifting and noisy data streams , 2017, Appl. Soft Comput..

[36]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Michal Wozniak,et al.  Data stream classification using active learned neural networks , 2019, Neurocomputing.

[38]  Michal Wozniak,et al.  Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification , 2019, CORES.

[39]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[40]  Bartosz Krawczyk,et al.  Adaptive Ensemble Active Learning for Drifting Data Stream Mining , 2019, IJCAI.

[41]  Edwin Lughofer,et al.  On-line active learning: A new paradigm to improve practical useability of data stream modeling methods , 2017, Inf. Sci..

[42]  Wei Liu,et al.  The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift , 2018, Neurocomputing.

[43]  Xin Yao,et al.  Resampling-Based Ensemble Methods for Online Class Imbalance Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[44]  Bartosz Krawczyk,et al.  Clustering-Driven and Dynamically Diversified Ensemble for Drifting Data Streams , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[45]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[46]  Moamar Sayed Mouchaweh,et al.  Active learning for classifying data streams with unknown number of classes , 2018, Neural Networks.

[47]  Michal Wozniak,et al.  Classifier Selection for Highly Imbalanced Data Streams with Minority Driven Ensemble , 2019, ICAISC.

[48]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[49]  Hadi Sadoghi Yazdi,et al.  Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams , 2012, Neural Computing and Applications.

[50]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[51]  Steven C. H. Hoi,et al.  Cost-sensitive online active learning with application to malicious URL detection , 2013, KDD.

[52]  Nan Liu,et al.  Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift , 2015, Neurocomputing.

[53]  Haibo He,et al.  SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining , 2009, 2009 International Joint Conference on Neural Networks.

[54]  Hang Zhang,et al.  Online Active Learning Paired Ensemble for Concept Drift and Class Imbalance , 2018, IEEE Access.

[55]  Mehmed M. Kantardzic,et al.  SOM-based partial labeling of imbalanced data stream , 2017, Neurocomputing.

[56]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[57]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[58]  Hadi Sadoghi Yazdi,et al.  Recursive least square perceptron model for non-stationary and imbalanced data stream classification , 2013, Evol. Syst..

[59]  Shuo Wang,et al.  Resample-Based Ensemble Framework for Drifting Imbalanced Data Streams , 2019, IEEE Access.

[60]  Bartosz Krawczyk,et al.  Combining active learning with concept drift detection for data stream mining , 2018, 2018 IEEE International Conference on Big Data (Big Data).