High probability guarantees for stochastic convex optimization

Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation. More nuanced high probability guarantees are rare, and typically either rely on “light-tail” noise assumptions or exhibit worse sample complexity. In this work, we show that a wide class of stochastic optimization algorithms for strongly convex problems can be augmented with high confidence bounds at an overhead cost that is only logarithmic in the confidence level and polylogarithmic in the condition number. The procedure we propose, called proxBoost, is elementary and builds on two well-known ingredients: robust distance estimation and the proximal point method. We discuss consequences for both streaming (online) algorithms and offline algorithms based on empirical risk minimization.

[1]  Alexander V. Nazin,et al.  Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method , 2019, Automation and Remote Control.

[2]  Sham M. Kakade,et al.  The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure , 2019, NeurIPS.

[3]  Julien Mairal,et al.  Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise , 2019, J. Mach. Learn. Res..

[4]  Rong Jin,et al.  Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD? , 2018, ArXiv.

[5]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[6]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[7]  Zeyuan Allen-Zhu,et al.  How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.

[8]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[9]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[10]  G. Lugosi,et al.  Sub-Gaussian estimators of the mean of a random vector , 2017, The Annals of Statistics.

[11]  G. Lugosi,et al.  Risk minimization by median-of-means tournaments , 2016, Journal of the European Mathematical Society.

[12]  G. Lugosi,et al.  On the estimation of the mean of a random vector , 2016, 1607.05421.

[13]  A. Juditsky,et al.  Deterministic and Stochastic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization , 2014 .

[14]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[15]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[16]  Daniel J. Hsu,et al.  Loss Minimization and Parameter Estimation with Heavy Tails , 2013, J. Mach. Learn. Res..

[17]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[18]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[19]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[20]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[21]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[22]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[23]  Yurii Nesterov,et al.  Confidence level solutions for stochastic programming , 2000, Autom..

[24]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[25]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[26]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[27]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[28]  Alon Gonen Understanding Machine Learning From Theory to Algorithms 1st Edition Shwartz Solutions Manual , 2015 .

[29]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[30]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[31]  B. Martinet,et al.  R'egularisation d''in'equations variationnelles par approximations successives , 1970 .

[32]  Elad Hazan 24th Annual Conference on Learning Theory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization , 2022 .