Online Ensemble Learning for Imbalanced Data Streams

While both cost-sensitive learning and online learning have been studied extensively, the effort in simultaneously dealing with these two issues is limited. Aiming at this challenge task, a novel learning framework is proposed in this paper. The key idea is based on the fusion of online ensemble algorithms and the state of the art batch mode cost-sensitive bagging/boosting algorithms. Within this framework, two separately developed research areas are bridged together, and a batch of theoretically sound online cost-sensitive bagging and online cost-sensitive boosting algorithms are first proposed. Unlike other online cost-sensitive learning algorithms lacking theoretical analysis of asymptotic properties, the convergence of the proposed algorithms is guaranteed under certain conditions, and the experimental evidence with benchmark data sets also validates the effectiveness and efficiency of the proposed methods.

[1]  Tom Fawcett,et al.  "In vivo" spam filtering: a challenge problem for KDD , 2003, SKDD.

[2]  Haibo He,et al.  Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach , 2011, Evol. Syst..

[3]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[4]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[7]  H. Kashima,et al.  Roughly balanced bagging for imbalanced data , 2009 .

[8]  Christoforos Anagnostopoulos,et al.  Online linear and quadratic discriminant analysis with adaptive forgetting for streaming classification , 2012, Stat. Anal. Data Min..

[9]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[11]  Edward Y. Chang,et al.  Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning , 2003, ICML.

[12]  Nitesh V. Chawla,et al.  Learning in non-stationary environments with class imbalance , 2012, KDD.

[13]  Victor S. Sheng,et al.  Roulette Sampling for Cost-Sensitive Learning , 2007, ECML.

[14]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[15]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[16]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[17]  J. Gotman Automatic seizure detection: improvements and evaluation. , 1990, Electroencephalography and clinical neurophysiology.

[18]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[19]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[20]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[21]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[22]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[23]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[24]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[25]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[26]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[27]  Zhi-Hua Zhou,et al.  Least Square Incremental Linear Discriminant Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[28]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[29]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Taghi M. Khoshgoftaar,et al.  Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.