Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss

We characterize the complexity of minimizing maxi∈[N ] fi(x) for convex, Lipschitz functions f1, . . . , fN . For non-smooth functions, existing methods require O(N −2) queries to a first-order oracle to compute an -suboptimal point and Õ(N −1) queries if the fi are O(1/ )-smooth. We develop methods with improved complexity bounds of Õ(N −2/3 + −8/3) in the non-smooth case and Õ(N −2/3 + √ N −1) in theO(1/ )-smooth case. Our methods consist of a recently proposed ball optimization oracle acceleration algorithm (which we refine) and a careful implementation of said oracle for the softmax function. We also prove an oracle complexity lower bound scaling as Ω(N −2/3), showing that our dependence on N is optimal up to polylogarithmic factors.

[1]  Cristobal Guzman,et al.  On lower complexity bounds for large-scale smooth convex optimization , 2013, J. Complex..

[2]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[3]  Yurii Nesterov,et al.  Implementable tensor methods in unconstrained convex optimization , 2019, Mathematical Programming.

[4]  Yin Tat Lee,et al.  Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives , 2019, Annual Conference Computational Learning Theory.

[5]  Claude Lemaréchal,et al.  An Algorithm for Minimizing Convex Functions , 1974, IFIP Congress.

[6]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[7]  Yi Ma,et al.  Towards Unified Acceleration of High-Order Algorithms under Hölder Continuity and Uniform Convexity , 2019, ArXiv.

[8]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[9]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[10]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[11]  Jelena Diakonikolas,et al.  Lower Bounds for Parallel and Randomized Convex Optimization , 2018, COLT.

[12]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[13]  Yin Tat Lee,et al.  Complexity of Highly Parallel Non-Smooth Convex Optimization , 2019, NeurIPS.

[14]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[15]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[16]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[17]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[18]  Brian Bullins,et al.  Highly smooth minimization of non-smooth problems , 2020, COLT.

[19]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[20]  Sham M. Kakade,et al.  Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[21]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[22]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[23]  David P. Woodruff,et al.  Sublinear Optimization for Machine Learning , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[24]  Yin Tat Lee,et al.  Minimum cost flows, MDPs, and ℓ1-regression in nearly linear time for dense instances , 2021, STOC.

[25]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[26]  D. J. Newman,et al.  Location of the Maximum on Unimodal Surfaces , 1965, JACM.

[27]  Anja De Waegenaere,et al.  Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[28]  Nathan Srebro,et al.  Beating SGD: Learning SVMs in Sublinear Time , 2011, NIPS.

[29]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[30]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[31]  Nathan Srebro,et al.  Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.

[32]  Kevin Tian,et al.  Variance Reduction for Matrix Games , 2019, NeurIPS.

[33]  Yonatan Wexler,et al.  Minimizing the Maximal Loss: How and Why , 2016, ICML.

[34]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[35]  Yin Tat Lee,et al.  Acceleration with a Ball Optimization Oracle , 2020, NeurIPS.

[36]  Yin Tat Lee,et al.  Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[37]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[38]  Renato D. C. Monteiro,et al.  An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[39]  Laurent El Ghaoui,et al.  Robust Optimization , 2021, ICORES.