Stochastic Bias-Reduced Gradient Methods

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer x? of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn [8] to turn any optimal stochastic gradient method into an estimator of x? with bias δ, variance O(log(1/δ)), and an expected sampling cost of O(log(1/δ)) stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of N functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.

[1]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[2]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[3]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[4]  Yi Ma,et al.  Towards Unified Acceleration of High-Order Algorithms under Hölder Continuity and Uniform Convexity , 2019, ArXiv.

[5]  Yin Tat Lee,et al.  Complexity of Highly Parallel Non-Smooth Convex Optimization , 2019, NeurIPS.

[6]  Laurent El Ghaoui,et al.  Robust Optimization , 2021, ICORES.

[7]  Raef Bassily,et al.  Private Stochastic Convex Optimization with Optimal Rates , 2019, NeurIPS.

[8]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[9]  Guanghui Lan,et al.  Gradient sliding for composite optimization , 2014, Mathematical Programming.

[10]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[11]  Yin Tat Lee,et al.  Acceleration with a Ball Optimization Oracle , 2020, NeurIPS.

[12]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[13]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[14]  Yuyang Shi,et al.  On Multilevel Monte Carlo Unbiased Gradient Estimation for Deep Latent Variable Models , 2021, AISTATS.

[15]  Guanghui Lan,et al.  Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization , 2013, Mathematical Programming.

[16]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[17]  Kunal Talwar,et al.  Private stochastic convex optimization: optimal rates in linear time , 2020, STOC.

[18]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[19]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[20]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[21]  Tomer Koren,et al.  Private Stochastic Convex Optimization: Optimal Rates in 𝓁1 Geometry , 2021, ICML.

[22]  Sewoong Oh,et al.  Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method , 2020, NeurIPS.

[23]  David P. Woodruff,et al.  Sublinear Optimization for Machine Learning , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[24]  Yair Carmon,et al.  Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[25]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[26]  Peter W. Glynn,et al.  Unbiased Monte Carlo for optimization and functions of expectations via multi-level randomization , 2015, 2015 Winter Simulation Conference (WSC).

[27]  Michael Cohen,et al.  On Acceleration with Noise-Corrupted Gradients , 2018, ICML.

[28]  N. Z. Shor Cut-off method with space extension in convex programming problems , 1977, Cybernetics.

[29]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[30]  Yonatan Wexler,et al.  Minimizing the Maximal Loss: How and Why , 2016, ICML.

[31]  D. Drusvyatskiy The proximal point method revisited , 2017, 1712.06038.

[32]  Arkadi Nemirovski,et al.  On Parallel Complexity of Nonsmooth Convex Optimization , 1994, J. Complex..

[33]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[34]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[35]  Renato D. C. Monteiro,et al.  An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[36]  Sham M. Kakade,et al.  Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[37]  Yin Tat Lee,et al.  An improved cutting plane method for convex optimization, convex-concave games, and its applications , 2020, STOC.

[38]  Saharon Shelah,et al.  Nearly Linear Time , 1989, Logic at Botik.

[39]  Stefan Heinrich,et al.  Multilevel Monte Carlo Methods , 2001, LSSC.

[40]  Janardhan Kulkarni,et al.  Private Non-smooth Empirical Risk Minimization and Stochastic Convex Optimization in Subquadratic Steps , 2021, ArXiv.

[41]  Jelena Diakonikolas,et al.  Lower Bounds for Parallel and Randomized Convex Optimization , 2018, COLT.

[42]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[43]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[44]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[45]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[46]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[47]  Yair Carmon,et al.  Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss , 2021, COLT.

[48]  Peter W. Glynn,et al.  Unbiased Multilevel Monte Carlo: Stochastic Optimization, Steady-state Simulation, Quantiles, and Other Applications , 2019, 1904.09929.

[49]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[50]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[51]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[52]  Yang Kang,et al.  Semi‐supervised Learning Based on Distributionally Robust Optimization , 2020 .