The Generalization Ability of Online Algorithms for Dependent Data

We study the generalization performance of online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret-an easily computable statistic of the online performance of the algorithm-when the underlying ergodic process is β- or φ -mixing. We show high-probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory.

[1]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[2]  Patrick Billingsley,et al.  Probability and Measure. , 1986 .

[3]  G. Roberts,et al.  Polynomial convergence rates of Markov chains. , 2002 .

[4]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[5]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[6]  Ron Meir,et al.  Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.

[7]  Michael I. Jordan,et al.  Ergodic mirror descent , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[9]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[10]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[11]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[12]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[13]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[14]  P. Billingsley,et al.  Probability and Measure , 1980 .

[15]  Andreas Christmann,et al.  Fast Learning from Non-i.i.d. Observations , 2009, NIPS.

[16]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[17]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[18]  Dharmendra S. Modha,et al.  Minimum complexity regression estimation with weakly dependent observations , 1996, IEEE Trans. Inf. Theory.

[19]  Michael I. Jordan,et al.  Ergodic Subgradient Descent , 2011 .

[20]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[21]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[22]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[23]  Zongben Xu,et al.  The generalization performance of ERM algorithm with strongly mixing observations , 2009, Machine Learning.

[24]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[25]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[26]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[27]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[28]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[29]  Y. Singer,et al.  Logarithmic Regret Algorithms for Strongly Convex Repeated Games , 2007 .

[30]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[31]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[32]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .