Imbalanced Data Stream Classification Using Hybrid Data Preprocessing

Imbalanced data streams have gained significant popularity among the researchers in recent years. This area of research is not only still greatly underdeveloped, but there are also numerous inherent difficulties that need to be addressed when creating algorithms that could be utilized in such dynamic environment and achieve satisfactory results when it comes to their predictive abilities. In this paper, a novel algorithm that combines both over- and under-sampling techniques in order to create a more robust classifier dedicated to imbalanced data streams is proposed. The efficiency and high predictive quality of the proposed method have been confirmed on the basis of extensive experimental research carried out on the real and the computer-generated data streams.

[1]  Mohamed Medhat Gaber,et al.  Knowledge discovery from data streams , 2009, IDA 2009.

[2]  Haibo He,et al.  Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach , 2011, Evol. Syst..

[3]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[4]  Bartosz Krawczyk,et al.  Radial-Based Approach to Imbalanced Data Oversampling , 2017, HAIS.

[5]  Yang Zhang,et al.  Mining Data Streams with Skewed Distribution by Static Classifier Ensemble , 2009 .

[6]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[7]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[8]  Bhavani M. Thuraisingham,et al.  A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[10]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[11]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[12]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[13]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[14]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[15]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[16]  Haibo He,et al.  SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining , 2009, 2009 International Joint Conference on Neural Networks.

[17]  Philip S. Yu,et al.  Classifying Data Streams with Skewed Class Distributions and Concept Drifts , 2008, IEEE Internet Computing.

[18]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[21]  Michal Wozniak,et al.  Experimental Study on Modified Radial-Based Oversampling , 2018, SOCO-CISIS-ICEUTE.

[22]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[23]  Jerzy Stefanowski,et al.  Ensemble Classifiers for Imbalanced and Evolving Data Streams , 2018 .

[24]  Emmanuel Bacry,et al.  tick: a Python Library for Statistical Learning, with an emphasis on Hawkes Processes and Time-Dependent Models , 2017, J. Mach. Learn. Res..

[25]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.