On the Inter-relationships among Drift rate, Forgetting rate, Bias/variance profile and Error

We propose two general and falsifiable hypotheses about expectations on generalization error when learning in the context of concept drift. One posits that as drift rate increases, the forgetting rate that minimizes generalization error will also increase and vice versa. The other posits that as a learner's forgetting rate increases, the bias/variance profile that minimizes generalization error will have lower variance and vice versa. These hypotheses lead to the concept of the sweet path, a path through the 3-d space of alternative drift rates, forgetting rates and bias/variance profiles on which generalization error will be minimized, such that slow drift is coupled with low forgetting and low bias, while rapid drift is coupled with fast forgetting and low variance. We present experiments that support the existence of such a sweet path. We also demonstrate that simple learners that select appropriate forgetting rates and bias/variance profiles are highly competitive with the state-of-the-art in incremental learners for concept drift on real-world drift problems.

[1]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[2]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[4]  Krishna M. Sivalingam,et al.  Machine Learning and Knowledge Discovery in Databases , 2011, Lecture Notes in Computer Science.

[5]  Wee Keong Ng,et al.  A survey on data stream clustering and classification , 2015, Knowledge and Information Systems.

[6]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[7]  Geoffrey I. Webb,et al.  Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification , 2011, Machine Learning.

[8]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[9]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[10]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[11]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[12]  Geoff Holmes,et al.  New Options for Hoeffding Trees , 2007, Australian Conference on Artificial Intelligence.

[13]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[14]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[15]  João Gama,et al.  An Overview on Mining Data Streams , 2009, Foundations of Computational Intelligence.

[16]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[17]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[18]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[19]  Geoffrey I. Webb,et al.  On the effect of data set size on bias and variance in classification learning , 1999 .

[20]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[21]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[22]  Charu C. Aggarwal,et al.  Data Streams: An Overview and Scientific Applications , 2010, Scientific Data Mining and Knowledge Discovery.

[23]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.