Multi-Window Based Ensemble Learning for Classification of Imbalanced Streaming Data

Imbalanced streaming data is widely existed in real world and has attracted much attention in recent years. Most studies focus on either imbalance data or streaming data; however, both imbalance data and streaming data are always accompanied in practice. In this paper, we propose a multi-window based ensemble learning (MWEL as short) method for the classification of imbalanced streaming data. Three types of windows are defined to store the current batch of instances, the latest minority instances and the ensemble classifier. The ensemble classifier consists of a set of latest sub-classifiers, and instances each sub-classifier trained on respectively. All sub-classifiers are weighted before predicting new arriving instance’s class labels and new sub-classifiers are trained if a precision is below a threshold. Extensive experiments on synthetic datasets and real world datasets demonstrate that the new approach can efficiently and efficiently classify imbalanced streaming data and outperform existing approaches.

[1]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[2]  Grigorios Tsoumakas,et al.  Dealing with Concept Drift and Class Imbalance in Multi-Label Stream Classification , 2011, IJCAI.

[3]  Zhigang Luo,et al.  Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples , 2010, Comput. Biol. Medicine.

[4]  Bhavani M. Thuraisingham,et al.  Supervised Learning for Insider Threat Detection Using Stream Mining , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[5]  Jiebo Luo,et al.  Multilabel machine learning and its application to semantic scene classification , 2003, IS&T/SPIE Electronic Imaging.

[6]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[7]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[8]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[9]  Wuying Liu,et al.  Simple-Random-Sampling-Based Multiclass Text Classification Algorithm , 2014, TheScientificWorldJournal.

[10]  Haibo He,et al.  Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach , 2011, Evol. Syst..

[11]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[12]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[13]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[14]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[15]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[16]  Hong Shen,et al.  A Selectively Re-train Approach Based on Clustering to Classify Concept-Drifting Data Streams with Skewed Distribution , 2014, PAKDD.

[17]  Guoyin Wang,et al.  Erratum to “Experimental Analyses of the Major Parameters Affecting the Intensity of Outbursts of Coal and Gas” , 2014, The Scientific World Journal.

[18]  David A. Cieslak,et al.  A Robust Decision Tree Algorithm for Imbalanced Data Sets , 2010, SDM.

[19]  Yang Wang,et al.  Parameter Inference of Cost-Sensitive Boosting Algorithms , 2005, MLDM.

[20]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[21]  Nitesh V. Chawla,et al.  Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams , 2009, PAKDD Workshops.

[22]  Cesare Alippi,et al.  Just in time classifiers: Managing the slow drift case , 2009, 2009 International Joint Conference on Neural Networks.

[23]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[24]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[25]  Hua Fan,et al.  Improving Text Categorization with Semantic Knowledge in Wikipedia , 2013, IEICE Trans. Inf. Syst..

[26]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[27]  Xin Yao,et al.  Online Class Imbalance Learning and its Applications in Fault Detection , 2013, Int. J. Comput. Intell. Appl..

[28]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[29]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[30]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[31]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[32]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.