论文信息 - Stochastic Bias-Reduced Gradient Methods - 字舞流文

Stochastic Bias-Reduced Gradient Methods

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer x? of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn [8] to turn any optimal stochastic gradient method into an estimator of x? with bias δ, variance O(log(1/δ)), and an expected sampling cost of O(log(1/δ)) stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of N functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.

Yair Carmon | Aaron Sidford | Hilal Asi | Arun Jambulapati | Yujia Jin | Aaron Sidford | Y. Carmon | A. Jambulapati | Hilal Asi | Yujia Jin | Hilal Asi

[1] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[2] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[3] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[4] Yi Ma,et al. Towards Unified Acceleration of High-Order Algorithms under Hölder Continuity and Uniform Convexity , 2019, ArXiv.

[5] Yin Tat Lee,et al. Complexity of Highly Parallel Non-Smooth Convex Optimization , 2019, NeurIPS.

[6] Laurent El Ghaoui,et al. Robust Optimization , 2021, ICORES.

[7] Raef Bassily,et al. Private Stochastic Convex Optimization with Optimal Rates , 2019, NeurIPS.

[8] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .

[9] Guanghui Lan,et al. Gradient sliding for composite optimization , 2014, Mathematical Programming.

[10] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[11] Yin Tat Lee,et al. Acceleration with a Ball Optimization Oracle , 2020, NeurIPS.

[12] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[13] Martin J. Wainwright,et al. Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[14] Yuyang Shi,et al. On Multilevel Monte Carlo Unbiased Gradient Estimation for Deep Latent Variable Models , 2021, AISTATS.

[15] Guanghui Lan,et al. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization , 2013, Mathematical Programming.

[16] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[17] Kunal Talwar,et al. Private stochastic convex optimization: optimal rates in linear time , 2020, STOC.

[18] L. G. H. Cijan. A polynomial algorithm in linear programming , 1979 .

[19] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[20] Dmitriy Drusvyatskiy,et al. Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[21] Tomer Koren,et al. Private Stochastic Convex Optimization: Optimal Rates in 𝓁1 Geometry , 2021, ICML.

[22] Sewoong Oh,et al. Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method , 2020, NeurIPS.

[23] David P. Woodruff,et al. Sublinear Optimization for Machine Learning , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[24] Yair Carmon,et al. Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[25] Eli Upfal,et al. Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[26] Peter W. Glynn,et al. Unbiased Monte Carlo for optimization and functions of expectations via multi-level randomization , 2015, 2015 Winter Simulation Conference (WSC).

[27] Michael Cohen,et al. On Acceleration with Noise-Corrupted Gradients , 2018, ICML.

[28] N. Z. Shor. Cut-off method with space extension in convex programming problems , 1977, Cybernetics.

[29] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[30] Yonatan Wexler,et al. Minimizing the Maximal Loss: How and Why , 2016, ICML.

[31] D. Drusvyatskiy. The proximal point method revisited , 2017, 1712.06038.

[32] Arkadi Nemirovski,et al. On Parallel Complexity of Nonsmooth Convex Optimization , 1994, J. Complex..

[33] Anand D. Sarwate,et al. Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[34] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[35] Renato D. C. Monteiro,et al. An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[36] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[37] Yin Tat Lee,et al. An improved cutting plane method for convex optimization, convex-concave games, and its applications , 2020, STOC.

[38] Saharon Shelah,et al. Nearly Linear Time , 1989, Logic at Botik.

[39] Stefan Heinrich,et al. Multilevel Monte Carlo Methods , 2001, LSSC.

[40] Janardhan Kulkarni,et al. Private Non-smooth Empirical Risk Minimization and Stochastic Convex Optimization in Subquadratic Steps , 2021, ArXiv.

[41] Jelena Diakonikolas,et al. Lower Bounds for Parallel and Randomized Convex Optimization , 2018, COLT.

[42] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[43] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..

[44] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[45] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.

[46] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[47] Yair Carmon,et al. Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss , 2021, COLT.

[48] Peter W. Glynn,et al. Unbiased Multilevel Monte Carlo: Stochastic Optimization, Steady-state Simulation, Quantiles, and Other Applications , 2019, 1904.09929.

[49] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[50] Ali Esmaili,et al. Probability and Random Processes , 2005, Technometrics.

[51] Raef Bassily,et al. Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[52] Yang Kang,et al. Semi‐supervised Learning Based on Distributionally Robust Optimization , 2020 .