Experimenting with prequential variations for data stream learning evaluation

Processing data streams requires new demands not existent on static environments. In online learning, the probability distribution of the data can often change over time (concept drift). The prequential assessment methodology is commonly used to evaluate the performance of classifiers in data streams with stationary and non‐stationary distributions. It is based on the premise that the purpose of statistical inference is to make sequential probability forecasts for future observations, rather than to express information about the past accuracy achieved. This article empirically evaluates the prequential methodology considering its three common strategies used to update the prediction model, namely, Basic Window, Sliding Window, and Fading Factors. Specifically, it aims to identify which of these variations is the most accurate for the experimental evaluation of the past results in scenarios where concept drifts occur, with greater interest in the accuracy observed within the total data flow. The prequential accuracy of the three variations and the real accuracy obtained in the learning process of each dataset are the basis for this evaluation. The results of the carried‐out experiments suggest that the use of Prequential with the Sliding Window variation is the best alternative.

[1]  Albert Bifet,et al.  DATA STREAM MINING A Practical Approach , 2009 .

[2]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[3]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[4]  Albert Bifet,et al.  Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams , 2010, Frontiers in Artificial Intelligence and Applications.

[5]  Yun Sing Koh,et al.  Detecting concept change in dynamic data streams , 2013, Machine Learning.

[6]  Lei Du,et al.  A Selective Detector Ensemble for Concept Drift Detection , 2015, Comput. J..

[7]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[8]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[9]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10]  Roberto Souto Maior de Barros,et al.  A Boosting-like Online Learning Ensemble , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[11]  F. Ingeniería,et al.  UNIVERSIDAD DE GRANMA , 2010 .

[12]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[13]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[14]  Roberto Souto Maior de Barros,et al.  An overview and comprehensive comparison of ensembles for concept drift , 2019, Inf. Fusion.

[15]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[16]  Raghu Ramakrishnan,et al.  Proceedings : KDD 2000 : the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA , 2000 .

[17]  Roberto Souto Maior de Barros,et al.  A large-scale comparison of concept drift detectors , 2018, Inf. Sci..

[18]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[19]  Roberto Souto Maior de Barros,et al.  A Lightweight Concept Drift Detection Ensemble , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[20]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[21]  Lei Du,et al.  Detecting concept drift: An information entropy based method using an adaptive sliding window , 2014, Intell. Data Anal..

[22]  Roberto Souto Maior de Barros,et al.  RCD: A recurring concept drift framework , 2013, Pattern Recognit. Lett..

[23]  Roberto Souto Maior de Barros,et al.  RDDM: Reactive drift detection method , 2017, Expert Syst. Appl..

[24]  Isvani Frías Blanco Nuevos métodos para el aprendizaje en flujos de datos no estacionarios , 2014 .

[25]  Roberto Souto Maior de Barros,et al.  Wilcoxon Rank Sum Test Drift Detector , 2018, Neurocomputing.

[26]  Roberto Souto Maior de Barros,et al.  Speeding Up Recovery from Concept Drifts , 2014, ECML/PKDD.

[27]  Dimitris K. Tasoulis,et al.  Exponentially weighted moving average charts for detecting concept drift , 2012, Pattern Recognit. Lett..

[28]  Ron Larson,et al.  Elementary Statistics: Picturing the World , 1999 .

[29]  Roberto Souto Maior de Barros,et al.  Online AdaBoost-based methods for multiclass problems , 2019, Artificial Intelligence Review.

[30]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[31]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[32]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[33]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[34]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[35]  Roberto Souto Maior de Barros,et al.  Optimizing the Parameters of Drift Detection Methods Using a Genetic Algorithm , 2015, ICTAI.

[36]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[37]  Herna L. Viktor,et al.  Fast Hoeffding Drift Detection Method for Evolving Data Streams , 2016, ECML/PKDD.

[38]  James P. Braselton,et al.  Multiple Comparison Methods for Means , 2002, SIAM Rev..

[39]  O. Johnson Information Theory And The Central Limit Theorem , 2004 .

[40]  Roberto Souto Maior de Barros,et al.  A comparative study on concept drift detectors , 2014, Expert Syst. Appl..

[41]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[42]  José del Campo-Ávila,et al.  Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds , 2015, IEEE Transactions on Knowledge and Data Engineering.