Fast rates with high probability in exp-concave statistical learning

We present an algorithm for the statistical learning setting with a bounded exp-concave loss in $d$ dimensions that obtains excess risk $O(d \log(1/\delta)/n)$ with probability at least $1 - \delta$. The core technique is to boost the confidence of recent in-expectation $O(d/n)$ excess risk bounds for empirical risk minimization (ERM), without sacrificing the rate, by leveraging a Bernstein condition which holds due to exp-concavity. We also show that with probability $1 - \delta$ the standard ERM method obtains excess risk $O(d (\log(n) + \log(1/\delta))/n)$. We further show that a regret bound for any online learner in this setting translates to a high probability excess risk bound for the corresponding online-to-batch conversion of the online learner. Lastly, we present two high probability bounds for the exp-concave model selection aggregation problem that are quantile-adaptive in a certain sense. The first bound is a purely exponential weights type algorithm, obtains a nearly optimal rate, and has no explicit dependence on the Lipschitz continuity of the loss. The second bound requires Lipschitz continuity but obtains the optimal rate.

[1]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[2]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[3]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[4]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[5]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[6]  A. Barron,et al.  Robustly Minimax Codes for Universal Data Compression , 1998 .

[7]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[10]  Mathukumalli Vidyasagar,et al.  Learning and Generalization: With Applications to Neural Networks , 2002 .

[11]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[12]  Jean-Yves Audibert,et al.  Progressive mixture rules are deviation suboptimal , 2007, NIPS.

[13]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[14]  A. Juditsky,et al.  Learning by mirror averaging , 2005, math/0511468.

[15]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[16]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[17]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[18]  S. Mendelson,et al.  Aggregation via empirical risk minimization , 2009 .

[19]  Mark D. Reid,et al.  Mixability in Statistical Learning , 2012, NIPS.

[20]  Claudio Gentile,et al.  Beyond Logarithmic Bounds in Online Learning , 2012, AISTATS.

[21]  Tomer Koren,et al.  Open Problem: Fast Stochastic Exp-Concave Optimization , 2013, COLT.

[22]  Robert C. Williamson,et al.  From Stochastic Mixability to Fast Rates , 2014, NIPS.

[23]  Rong Jin,et al.  Excess Risk Bounds for Exponentially Concave Losses , 2014, ArXiv.

[24]  P. Rigollet,et al.  Optimal learning with Q-aggregation , 2013, 1301.6080.

[25]  Rong Jin,et al.  Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization , 2015, COLT.

[26]  Kfir Y. Levy,et al.  Fast Rates for Exp-concave Empirical Risk Minimization , 2015, NIPS.

[27]  Mark D. Reid,et al.  Fast rates in statistical and online learning , 2015, J. Mach. Learn. Res..

[28]  Shai Shalev-Shwartz,et al.  Tightening the Sample Complexity of Empirical Risk Minimization via Preconditioned Stability , 2016, ArXiv.