Handling concept drifts and limited label problems using semi-supervised combine-merge Gaussian mixture model

When predicting data streams, changes in data distribution may decrease model accuracy over time, thereby making the model obsolete. This phenomenon is known as concept drift. Detecting concept drifts and then adapting to them are critical operations to maintain model performance. However, model adaptation can only be made if labeled data is available. Labeling data is both costly and time-consuming because it has to be done by humans. Only part of the data can be labeled in the data stream because the data size is massive and appears at high speed. To solve these problems simultaneously, we apply a technique to update the model by employing both labeled and unlabeled instances to do so. The experiment results show that our proposed method can adapt to the concept drift with pseudo-labels and maintain its accuracy even though label availability is drastically reduced from 95% to 5%. The proposed method also has the highest overall accuracy and outperforms other methods in 5 of 10 datasets.

[1]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[2]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[3]  Robi Polikar,et al.  COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Jing Liu,et al.  Data streams classification with ensemble model based on decision-feedback , 2014 .

[5]  Ibnu Daqiqil Id,et al.  Concept Drift Adaptation for Acoustic Scene Classifier Based on Gaussian Mixture Model , 2020, 2020 IEEE REGION 10 CONFERENCE (TENCON).

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Geoff Holmes,et al.  Efficient data stream classification via probabilistic adaptive windows , 2013, SAC '13.

[8]  Grigorios Tsoumakas,et al.  An adaptive personalized news dissemination system , 2009, Journal of Intelligent Information Systems.

[9]  Geoffrey I. Webb,et al.  Extremely Fast Decision Tree , 2018, KDD.

[10]  Latifur Khan,et al.  Facing the reality of data stream classification: coping with scarcity of labeled data , 2012, Knowledge and Information Systems.

[11]  Zhi-Hua Zhou,et al.  Handling concept drift via model reuse , 2018, Machine Learning.

[12]  Waqar Ali,et al.  Online reliable semi-supervised learning on evolving data streams , 2020, Inf. Sci..

[13]  Stanislav Abaimov,et al.  Understanding Machine Learning , 2022, Machine Learning for Cyber Agents.

[14]  Ahmed Farouk,et al.  DETECTION AND HANDLING OF DIFFERENT TYPES OF CONCEPT DRIFT IN NEWS RECOMMENDATION SYSTEMS , 2019, International Journal of Computer Science and Information Technology.

[15]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[16]  Masanobu Abe,et al.  Acoustic Scene Classifier Based on Gaussian Mixture Model in the Concept Drift Situation , 2021, Advances in Science, Technology and Engineering Systems Journal.

[17]  A. Abdulazeez,et al.  CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING: A REVIEW , 2021, Iraqi Journal for Computers and Informatics.

[18]  Jean Paul Barddal,et al.  A survey on feature drift adaptation: Definition, benchmark, challenges and future directions , 2017, J. Syst. Softw..

[19]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[20]  Heiko Wersing,et al.  KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[21]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[22]  Masanobu Abe,et al.  Evaluation of concept drift adaptation for acoustic scene classifier based on Kernel Density Drift Detection and Combine Merge Gaussian Mixture Model , 2021, ArXiv.

[23]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[24]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.