论文信息 - Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss - 字舞流文

Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss

We characterize the complexity of minimizing maxi∈[N ] fi(x) for convex, Lipschitz functions f1, . . . , fN . For non-smooth functions, existing methods require O(N −2) queries to a first-order oracle to compute an -suboptimal point and Õ(N −1) queries if the fi are O(1/ )-smooth. We develop methods with improved complexity bounds of Õ(N −2/3 + −8/3) in the non-smooth case and Õ(N −2/3 + √ N −1) in theO(1/ )-smooth case. Our methods consist of a recently proposed ball optimization oracle acceleration algorithm (which we refine) and a careful implementation of said oracle for the softmax function. We also prove an oracle complexity lower bound scaling as Ω(N −2/3), showing that our dependence on N is optimal up to polylogarithmic factors.

Yair Carmon | Aaron Sidford | Arun Jambulapati | Yujia Jin | Aaron Sidford | Y. Carmon | A. Jambulapati | Yujia Jin

[1] Cristobal Guzman,et al. On lower complexity bounds for large-scale smooth convex optimization , 2013, J. Complex..

[2] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[3] Yurii Nesterov,et al. Implementable tensor methods in unconstrained convex optimization , 2019, Mathematical Programming.

[4] Yin Tat Lee,et al. Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives , 2019, Annual Conference Computational Learning Theory.

[5] Claude Lemaréchal,et al. An Algorithm for Minimizing Convex Functions , 1974, IFIP Congress.

[6] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[7] Yi Ma,et al. Towards Unified Acceleration of High-Order Algorithms under Hölder Continuity and Uniform Convexity , 2019, ArXiv.

[8] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[9] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[10] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[11] Jelena Diakonikolas,et al. Lower Bounds for Parallel and Randomized Convex Optimization , 2018, COLT.

[12] Yair Carmon,et al. Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[13] Yin Tat Lee,et al. Complexity of Highly Parallel Non-Smooth Convex Optimization , 2019, NeurIPS.

[14] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[15] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[16] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[17] John C. Duchi,et al. Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[18] Brian Bullins,et al. Highly smooth minimization of non-smooth problems , 2020, COLT.

[19] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[20] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[21] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[22] Osman Güler,et al. New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[23] David P. Woodruff,et al. Sublinear Optimization for Machine Learning , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[24] Yin Tat Lee,et al. Minimum cost flows, MDPs, and ℓ1-regression in nearly linear time for dense instances , 2021, STOC.

[25] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .

[26] D. J. Newman,et al. Location of the Maximum on Unimodal Surfaces , 1965, JACM.

[27] Anja De Waegenaere,et al. Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[28] Nathan Srebro,et al. Beating SGD: Learning SVMs in Sublinear Time , 2011, NIPS.

[29] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[30] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[31] Nathan Srebro,et al. Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.

[32] Kevin Tian,et al. Variance Reduction for Matrix Games , 2019, NeurIPS.

[33] Yonatan Wexler,et al. Minimizing the Maximal Loss: How and Why , 2016, ICML.

[34] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[35] Yin Tat Lee,et al. Acceleration with a Ball Optimization Oracle , 2020, NeurIPS.

[36] Yin Tat Lee,et al. Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[37] Yurii Nesterov,et al. Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[38] Renato D. C. Monteiro,et al. An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[39] Laurent El Ghaoui,et al. Robust Optimization , 2021, ICORES.