Concept Drift Detection with Hierarchical Hypothesis Testing

When using statistical models (such as a classifier) in a streaming environment, there is often a need to detect and adapt to concept drifts to mitigate any deterioration in the model’s predictive performance over time. Unfortunately, the ability of popular concept drift approaches in detecting these drifts in the relationship of the response and predictor variable is often dependent on the distribution characteristics of the data streams, as well as its sensitivity on parameter tuning. This paper presents Hierarchical Linear Four Rates (HLFR), a framework that detects concept drifts for different data stream distributions (including imbalanced data) by leveraging a hierarchical set of hypothesis tests in an online setting. The performance of HLFR is compared to benchmark approaches using both simulated and real-world datasets spanning the breadth of concept drift types. HLFR significantly outperforms benchmark approaches in terms of accuracy, G-mean, recall, delay in detection and adaptability across the various datasets.

[1]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[2]  Peter Tiño,et al.  Concept drift detection for online class imbalance learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[3]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[4]  Geoff Holmes,et al.  Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them , 2013, ECML/PKDD.

[5]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[6]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[9]  Grigorios Tsoumakas,et al.  An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams , 2008, ECAI.

[10]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[11]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[12]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[13]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[14]  Ludmila I. Kuncheva,et al.  Adaptive Learning Rate for Online Linear Discriminant Classifiers , 2008, SSPR/SPR.

[15]  Heng Wang,et al.  Concept drift detection for streaming data , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[16]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[17]  Grigorios Tsoumakas,et al.  Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams , 2006 .

[18]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[19]  Geoff Holmes,et al.  Evaluation methods and decision theory for classification of streaming data with temporal dependence , 2015, Machine Learning.

[20]  Dimitris K. Tasoulis,et al.  Exponentially weighted moving average charts for detecting concept drift , 2012, Pattern Recognit. Lett..

[21]  Indre Zliobaite,et al.  How good is the Electricity benchmark for evaluating concept drift adaptation , 2013, ArXiv.