Concept Drift Detection and Adaptation with Hierarchical Hypothesis Testing

A fundamental issue for statistical classification models in a streaming environment is that the joint distribution between predictor and response variables changes over time (a phenomenon also known as concept drifts), such that their classification performance deteriorates dramatically. In this paper, we first present a hierarchical hypothesis testing (HHT) framework that can detect and also adapt to various concept drift types (e.g., recurrent or irregular, gradual or abrupt), even in the presence of imbalanced data labels. A novel concept drift detector, namely Hierarchical Linear Four Rates (HLFR), is implemented under the HHT framework thereafter. By substituting a widely-acknowledged retraining scheme with an adaptive training strategy, we further demonstrate that the concept drift adaptation capability of HLFR can be significantly boosted. The theoretical analysis on the Type-I and Type-II errors of HLFR is also performed. Experiments on both simulated and real-world datasets illustrate that our methods outperform state-of-the-art methods in terms of detection precision, detection delay as well as the adaptability across different concept drift types.

[1]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[2]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[3]  Mehmed M. Kantardzic,et al.  On the reliable detection of concept drift from streaming unlabeled data , 2017, Expert Syst. Appl..

[4]  Vitor Monte Afonso,et al.  Identifying Android malware using dynamically obtained features , 2014, Journal of Computer Virology and Hacking Techniques.

[5]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[6]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[7]  Grigorios Tsoumakas,et al.  Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams , 2006 .

[8]  João Gama,et al.  Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency , 2015, SDM.

[9]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[10]  Cesare Alippi,et al.  Just-In-Time Classifiers for Recurrent Concepts , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[13]  Shigeru Katagiri,et al.  Nonlinear Dynamical Systems: Feedforward Neural Network Perspectives , 2001 .

[14]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[15]  Gonzalo Mateos,et al.  Stochastic Approximation vis-a-vis Online Learning for Big Data Analytics [Lecture Notes] , 2014, IEEE Signal Processing Magazine.

[16]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[17]  Cesare Alippi,et al.  Hierarchical Change-Detection Tests , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Gerhard Widmer,et al.  Effective Learning in Dynamic Environments by Explicit Context Tracking , 1993, ECML.

[19]  José del Campo-Ávila,et al.  Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds , 2015, IEEE Transactions on Knowledge and Data Engineering.

[20]  Yu Sun,et al.  Concept Drift Adaptation by Exploiting Historical Knowledge , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[22]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[23]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[24]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[25]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[26]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[27]  Cesare Alippi,et al.  A hierarchical, nonparametric, sequential change-detection test , 2011, The 2011 International Joint Conference on Neural Networks.

[28]  Geoff Holmes,et al.  Evaluation methods and decision theory for classification of streaming data with temporal dependence , 2015, Machine Learning.

[29]  Cesare Alippi,et al.  Change detection tests using the ICI rule , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[30]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[31]  Roberto Souto Maior de Barros,et al.  A Lightweight Concept Drift Detection Ensemble , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[32]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[33]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[34]  Michal Wozniak,et al.  Ensembles of Heterogeneous Concept Drift Detectors - Experimental Study , 2016, CISIM.

[35]  S. W. Roberts,et al.  Control Chart Tests Based on Geometric Moving Averages , 2000, Technometrics.

[36]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[37]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[38]  Geoff Holmes,et al.  Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them , 2013, ECML/PKDD.

[39]  Cesare Alippi Intelligence for Embedded Systems: A Methodological Approach , 2014 .

[40]  Christoforos Anagnostopoulos,et al.  Temporally adaptive estimation of logistic classifiers on data streams , 2009, Adv. Data Anal. Classif..

[41]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[42]  Indre Zliobaite,et al.  Learning under Concept Drift: an Overview , 2010, ArXiv.

[43]  Dimitris K. Tasoulis,et al.  Exponentially weighted moving average charts for detecting concept drift , 2012, Pattern Recognit. Lett..

[44]  José Carlos Príncipe,et al.  Cognitive Architectures for Sensory Processing , 2014, Proceedings of the IEEE.

[45]  Michaela M. Black,et al.  Maintaining the performance of a learned classifier under concept drift , 1999, Intell. Data Anal..

[46]  Cesare Alippi,et al.  Just-in-time Adaptive Classifiers in Non-Stationary Conditions , 2007, 2007 International Joint Conference on Neural Networks.

[47]  Chid Apte,et al.  Proceedings of the 2007 SIAM International Conference on Data Mining , 2007 .

[48]  C. Helstrom,et al.  Statistical theory of signal detection , 1968 .

[49]  Ludmila I. Kuncheva,et al.  Adaptive Learning Rate for Online Linear Discriminant Classifiers , 2008, SSPR/SPR.

[50]  Herna L. Viktor,et al.  The PerfSim Algorithm for Concept Drift Detection in Imbalanced Data , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[51]  Irwin W. Sandberg Nonlinear dynamical systems : feedforward neural network perspectives , 2001 .

[52]  Roberto Souto Maior de Barros,et al.  A comparative study on concept drift detectors , 2014, Expert Syst. Appl..

[53]  Arjun K. Gupta,et al.  Parametric Statistical Change Point Analysis , 2000 .

[54]  Shujian Yu,et al.  Concept Drift Detection with Hierarchical Hypothesis Testing , 2017, SDM.

[55]  Nigel Collier,et al.  Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation , 2012, Neural Networks.

[56]  J. Norris Appendix: probability and measure , 1997 .

[57]  Cesare Alippi,et al.  Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Heng Wang,et al.  Concept drift detection for streaming data , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[59]  Indre Zliobaite,et al.  How good is the Electricity benchmark for evaluating concept drift adaptation , 2013, ArXiv.

[60]  R. N. Rattihalli,et al.  Distribution of Geometrically Weighted Sum of Bernoulli Random Variables , 2011 .

[61]  Rong Yan,et al.  Adapting SVM Classifiers to Data with Shifted Distributions , 2007 .

[62]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[63]  Lei Du,et al.  Detecting concept drift: An information entropy based method using an adaptive sliding window , 2014, Intell. Data Anal..

[64]  Cesare Alippi,et al.  A just-in-time adaptive classification system based on the intersection of confidence intervals rule , 2011, Neural Networks.

[65]  Grigorios Tsoumakas,et al.  Tracking recurring contexts using ensemble classifiers: an application to email filtering , 2009, Knowledge and Information Systems.

[66]  Lei Du,et al.  A Selective Detector Ensemble for Concept Drift Detection , 2015, Comput. J..

[67]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[68]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[69]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[70]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[71]  Dimitris K. Tasoulis,et al.  Adaptive consumer credit classification , 2012, J. Oper. Res. Soc..

[72]  D. Siegmund Sequential Analysis: Tests and Confidence Intervals , 1985 .

[73]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[74]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[75]  Grigorios Tsoumakas,et al.  An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams , 2008, ECAI.

[76]  KlinkenbergRalf Learning drifting concepts: Example selection vs. example weighting , 2004 .

[77]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[78]  Niall M. Adams,et al.  lambda-Perceptron: An adaptive classifier for data streams , 2011, Pattern Recognit..

[79]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[80]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[81]  B. Brodsky,et al.  Nonparametric Methods in Change Point Problems , 1993 .

[82]  Geoff Hulten,et al.  A General Framework for Mining Massive Data Streams , 2003 .

[83]  Shie Mannor,et al.  Concept Drift Detection Through Resampling , 2014, ICML.

[84]  Geoff Holmes,et al.  Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data , 2012, IDA.

[85]  Takafumi Kanamori,et al.  Least-squares two-sample test , 2011, Neural Networks.

[86]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[87]  Xin Yao,et al.  A Systematic Study of Online Class Imbalance Learning With Concept Drift , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[88]  Peter Tiño,et al.  Concept drift detection for online class imbalance learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[89]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .