Concept drift detection based on Fisher's Exact test

Efficient implementation of computationally expensive Fishers Exact Test.Three new concept drift detection methods based on Fishers Exact Test.Tested against DDM, ECDD, SEED, FHDDM, and STEPD using two base classifiers.Proposed methods are significantly superior to most other Detectors in accuracy.Proposed methods have better Precision, Recall and F-Measure than the other methods. Concept drift detectors are software that usually attempt to estimate the positions of concept drifts in large data streams in order to replace the base learner after changes in the data distribution and thus improve accuracy. Statistical Test of Equal Proportions (STEPD) is a simple, efficient, and well-known method which detects concept drifts based on a hypothesis test between two proportions. However, statistically, this test is not recommended when sample sizes are small or data are sparse and/or imbalanced. This article proposes an ingeniously efficient implementation of the statistically preferred but computationally expensive Fishers Exact test and examines three slightly different applications of this test for concept drift detection, proposing FPDD, FSDD, and FTDD. Experiments run using four artificial dataset generators, with both abrupt and gradual drift versions, as well as three real-world datasets, suggest that the new methods improve the accuracy results and the detections of STEPD and other well-known and/or recent concept drift detectors in many scenarios, with little impact on memory and run-time usage.

[1]  Mykola Pechenizkiy,et al.  An Overview of Concept Drift Applications , 2016 .

[2]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[3]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[4]  Gillian Dobbie,et al.  Detecting Volatility Shift in Data Streams , 2014, 2014 IEEE International Conference on Data Mining.

[5]  C R Mehta,et al.  The exact analysis of contingency tables in medical research , 1994, Statistical methods in medical research.

[6]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[7]  Nitin R. Patel,et al.  A Network Algorithm for Performing Fisher's Exact Test in r × c Contingency Tables , 1983 .

[8]  Marcus A. Maloof,et al.  Paired Learners for Concept Drift , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Herna L. Viktor,et al.  Fast Hoeffding Drift Detection Method for Evolving Data Streams , 2016, ECML/PKDD.

[10]  Dimitris K. Tasoulis,et al.  Sequential monitoring of a Bernoulli sequence when the pre-change parameter is unknown , 2012, Comput. Stat..

[11]  Roberto Souto Maior de Barros,et al.  A comparative study on concept drift detectors , 2014, Expert Syst. Appl..

[12]  Roberto Souto Maior de Barros,et al.  RDDM: Reactive drift detection method , 2017, Expert Syst. Appl..

[13]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[14]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[15]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[16]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[17]  Roberto Souto Maior de Barros,et al.  RCD: A recurring concept drift framework , 2013, Pattern Recognit. Lett..

[18]  Grigorios Tsoumakas,et al.  Tracking recurring contexts using ensemble classifiers: an application to email filtering , 2009, Knowledge and Information Systems.

[19]  Roberto Souto Maior de Barros,et al.  Optimizing the Parameters of Drift Detection Methods Using a Genetic Algorithm , 2015, ICTAI.

[20]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[21]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[22]  Dimitris K. Tasoulis,et al.  Exponentially weighted moving average charts for detecting concept drift , 2012, Pattern Recognit. Lett..

[23]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Fast adaptive stacking of ensembles , 2016, SAC.

[24]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[25]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[26]  Geoff Holmes,et al.  Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them , 2013, ECML/PKDD.

[27]  Dino Ienco,et al.  Clustering Based Active Learning for Evolving Data Streams , 2013, Discovery Science.

[28]  Wilhelmiina Hämäläinen,et al.  Efficient Discovery of the Top-K Optimal Dependency Rules with Fisher's Exact Test of Significance , 2010, 2010 IEEE International Conference on Data Mining.

[29]  Roberto Souto Maior de Barros,et al.  Speeding Up Recovery from Concept Drifts , 2014, ECML/PKDD.

[30]  Roberto Souto Maior de Barros,et al.  A Boosting-like Online Learning Ensemble , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[31]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[32]  S. W. Roberts Control chart tests based on geometric moving averages , 2000 .

[33]  R. Fisher,et al.  Statistical Methods for Research Workers , 1930, Nature.

[34]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[35]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[36]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[37]  Yun Sing Koh,et al.  Detecting concept change in dynamic data streams , 2013, Machine Learning.

[38]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[39]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[40]  Roberto Souto Maior de Barros,et al.  A Lightweight Concept Drift Detection Ensemble , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[41]  Roberto Souto Maior de Barros,et al.  Wilcoxon Rank Sum Test Drift Detector , 2018, Neurocomputing.

[42]  Allan G. Bluman Elementary Statistics: A Step By Step Approach , 1980 .

[43]  H. Chernoff,et al.  The Use of Maximum Likelihood Estimates in {\chi^2} Tests for Goodness of Fit , 1954 .

[44]  José del Campo-Ávila,et al.  Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds , 2015, IEEE Transactions on Knowledge and Data Engineering.

[45]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[46]  Yang Koo Lee,et al.  A System Architecture for Monitoring Sensor Data Stream , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).