论文信息 - High probability guarantees for stochastic convex optimization

High probability guarantees for stochastic convex optimization

Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation. More nuanced high probability guarantees are rare, and typically either rely on “light-tail” noise assumptions or exhibit worse sample complexity. In this work, we show that a wide class of stochastic optimization algorithms for strongly convex problems can be augmented with high confidence bounds at an overhead cost that is only logarithmic in the confidence level and polylogarithmic in the condition number. The procedure we propose, called proxBoost, is elementary and builds on two well-known ingredients: robust distance estimation and the proximal point method. We discuss consequences for both streaming (online) algorithms and offline algorithms based on empirical risk minimization.

Damek Davis | D. Drusvyatskiy

[1] Alexander V. Nazin,et al. Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method , 2019, Automation and Remote Control.

[2] Sham M. Kakade,et al. The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure , 2019, NeurIPS.

[3] Julien Mairal,et al. Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise , 2019, J. Mach. Learn. Res..

[4] Rong Jin,et al. Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD? , 2018, ArXiv.

[5] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .

[6] Kannan Ramchandran,et al. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[7] Zeyuan Allen-Zhu,et al. How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.

[8] Amir Beck,et al. First-Order Methods in Optimization , 2017 .

[9] Lili Su,et al. Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[10] G. Lugosi,et al. Sub-Gaussian estimators of the mean of a random vector , 2017, The Annals of Statistics.

[11] G. Lugosi,et al. Risk minimization by median-of-means tournaments , 2016, Journal of the European Mathematical Society.