New Analysis and Algorithm for Learning with Drifting Distributions

We present a new analysis of the problem of learning with drifting distributions in the batch setting using the notion of discrepancy. We prove learning bounds based on the Rademacher complexity of the hypothesis set and the discrepancy of distributions both for a drifting PAC scenario and a tracking scenario. Our bounds are always tighter and in some cases substantially improve upon previous ones based on the L1 distance. We also present a generalization of the standard on-line to batch conversion to the drifting scenario in terms of the discrepancy and arbitrary convex combinations of hypotheses. We introduce a new algorithm exploiting these learning guarantees, which we show can be formulated as a simple QP. Finally, we report the results of preliminary experiments demonstrating the benefits of this algorithm.

[1]  D. Pollard Convergence of stochastic processes , 1984 .

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[3]  R. Dudley A course on empirical processes , 1984 .

[4]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[5]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[6]  Peter L. Bartlett,et al.  Learning with a slowly changing distribution , 1992, COLT '92.

[7]  David Haussler,et al.  Proceedings of the fifth annual workshop on Computational learning theory , 1992, COLT 1992.

[8]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[9]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[10]  Philip M. Long,et al.  On the complexity of learning from drifting distributions , 1997, COLT '96.

[11]  Shai Ben-David,et al.  Learning Changing Concepts by Exploiting the Structure of Change , 1996, COLT '96.

[12]  Jan Kuper On the Jacopini Technique , 1997, Inf. Comput..

[13]  Yishay Mansour,et al.  Learning Under Persistent Drift , 1997, EuroCOLT.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[16]  Philip M. Long,et al.  Tracking drifting concepts by minimizing disagreements , 2004, Machine Learning.

[17]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[18]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[19]  Philip M. Long The Complexity of Learning According to Two Models of a Drifting Environment , 1998, COLT' 98.

[20]  M. Talagrand The Generic Chaining , 2005 .

[21]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[22]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[23]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[24]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[25]  Jennifer Wortman Vaughan,et al.  Regret Minimization with Concept Drift , 2010, COLT 2010.

[26]  Ambuj Tewari,et al.  Online Learning: Random Averages, Combinatorial Parameters, and Learnability , 2010, NIPS.

[27]  Mehryar Mohri,et al.  Domain Adaptation in Regression , 2011, ALT.