From exp-concavity to variance control: High probability O(1/n) rates and high probability online-to-batch conversion

We present an algorithm for the statistical learning setting with a bounded exp-concave loss in $d$ dimensions that obtains excess risk $O(d / n)$ with high probability: the dependence on the confidence parameter $\delta$ is polylogarithmic in $1/\delta$. The core technique is to boost the confidence of recent $O(d / n)$ bounds, without sacrificing the rate, by leveraging a Bernstein-type condition which holds due to exp-concavity. This Bernstein-type condition implies that the variance of excess loss random variables are controlled in terms of their excess risk. Using this variance control, we further show that a regret bound for any online learner in this setting translates to a high probability excess risk bound for the corresponding online-to-batch conversion of the online learner.

[1]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[2]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[3]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[4]  Tomer Koren,et al.  Open Problem: Fast Stochastic Exp-Concave Optimization , 2013, COLT.

[5]  Rong Jin,et al.  Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization , 2015, COLT.

[6]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[7]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[8]  Kfir Y. Levy,et al.  Fast Rates for Exp-concave Empirical Risk Minimization , 2015, NIPS.

[9]  Jean-Yves Audibert,et al.  Progressive mixture rules are deviation suboptimal , 2007, NIPS.

[10]  Claudio Gentile,et al.  Beyond Logarithmic Bounds in Online Learning , 2012, AISTATS.

[11]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[12]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[13]  Robert C. Williamson,et al.  From Stochastic Mixability to Fast Rates , 2014, NIPS.

[14]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[15]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[16]  Shai Shalev-Shwartz,et al.  Tightening the Sample Complexity of Empirical Risk Minimization via Preconditioned Stability , 2016, ArXiv.

[17]  Mark D. Reid,et al.  Mixability in Statistical Learning , 2012, NIPS.

[18]  Mark D. Reid,et al.  Fast rates in statistical and online learning , 2015, J. Mach. Learn. Res..