Fast Rates for Exp-concave Empirical Risk Minimization

We consider Empirical Risk Minimization (ERM) in the context of stochastic optimization with exp-concave and smooth losses—a general optimization framework that captures several important learning problems including linear and logistic regression, learning SVMs with the squared hinge-loss, portfolio selection and more. In this setting, we establish the first evidence that ERM is able to attain fast generalization rates, and show that the expected loss of the ERM solution in d dimensions converges to the optimal expected loss in a rate of d/n. This rate matches existing lower bounds up to constants and improves by a log n factor upon the state-of-the-art, which is only known to be attained by an online-to-batch conversion of computationally expensive online algorithms.

[1]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[2]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[3]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[4]  Elad Hazan,et al.  Interior-Point Methods for Full-Information and Bandit Online Learning , 2012, IEEE Transactions on Information Theory.

[5]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[6]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[7]  Rong Jin,et al.  Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization , 2015, COLT.

[8]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[9]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[10]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[11]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[12]  V. Vovk Competitive On‐line Statistics , 2001 .

[13]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[14]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[15]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[16]  Tomer Koren,et al.  Open Problem: Fast Stochastic Exp-Concave Optimization , 2013, COLT.

[17]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[18]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[19]  S. Mendelson,et al.  Performance of empirical risk minimization in linear aggregation , 2014, 1402.5763.

[20]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[21]  Ohad Shamir,et al.  The sample complexity of learning linear predictors with the squared loss , 2014, J. Mach. Learn. Res..