Acceleration with a Ball Optimization Oracle

Consider an oracle which takes a point $x$ and returns the minimizer of a convex function $f$ in an $\ell_2$ ball of radius $r$ around $x$. It is straightforward to show that roughly $r^{-1}\log\frac{1}{\epsilon}$ calls to the oracle suffice to find an $\epsilon$-approximate minimizer of $f$ in an $\ell_2$ unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an $\epsilon$-approximate minimizer with roughly $r^{-2/3} \log \frac{1}{\epsilon}$ oracle queries, and give a matching lower bound. Further, we implement ball optimization oracles for functions with locally stable Hessians using a variant of Newton's method. The resulting algorithm applies to a number of problems of practical and theoretical import, improving upon previous results for logistic and $\ell_\infty$ regression and achieving guarantees comparable to the state-of-the-art for $\ell_p$ regression.

[1]  Sushant Sachdeva,et al.  Faster p-norm minimizing flows, via smoothed q-norm problems , 2019, SODA.

[2]  Yin Tat Lee,et al.  Leverage Score Sampling for Faster Accelerated Regression and ERM , 2017, ALT.

[3]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[4]  Richard Peng,et al.  Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression , 2019, NeurIPS.

[5]  Kevin Tian,et al.  Variance Reduction for Matrix Games , 2019, NeurIPS.

[6]  Alessandro Rudi,et al.  Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses , 2019, NeurIPS.

[7]  Yin Tat Lee,et al.  Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives , 2019, COLT.

[8]  Richard Peng,et al.  Higher-Order Accelerated Methods for Faster Non-Smooth Optimization , 2019, ArXiv.

[9]  Yin Tat Lee,et al.  Complexity of Highly Parallel Non-Smooth Convex Optimization , 2019, NeurIPS.

[10]  Richard Peng,et al.  Iterative Refinement for ℓp-norm Regression , 2019, SODA.

[11]  Jelena Diakonikolas,et al.  Lower Bounds for Parallel and Randomized Convex Optimization , 2018, COLT.

[12]  Adrian Vladu,et al.  Improved Convergence for and 1 Regression via Iteratively Reweighted Least Squares , 2019 .

[13]  Brian Bullins,et al.  Fast minimization of structured convex quartics , 2018, 1812.10349.

[14]  Martin Jaggi,et al.  Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients , 2018, ArXiv.

[15]  Michael Cohen,et al.  On Acceleration with Noise-Corrupted Gradients , 2018, ICML.

[16]  Yin Tat Lee,et al.  An homotopy method for lp regression provably beyond self-concordance and in input-sparsity time , 2018, STOC.

[17]  Blake E. Woodworth,et al.  Lower Bound for Randomized First Order Convex Optimization , 2017, 1709.03594.

[18]  Naman Agarwal,et al.  Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..

[19]  D. Gleich TRUST REGION METHODS , 2017 .

[20]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[21]  Cristobal Guzman,et al.  On lower complexity bounds for large-scale smooth convex optimization , 2013, J. Complex..

[22]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[23]  Renato D. C. Monteiro,et al.  An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[24]  Mark W. Schmidt,et al.  Projected Newton-type methods in machine learning , 2011 .

[25]  Francis R. Bach,et al.  Self-concordant analysis for logistic regression , 2009, ArXiv.

[26]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[27]  William W. Hager,et al.  Minimizing a Quadratic Over a Sphere , 2001, SIAM J. Optim..

[28]  K. Ball An elementary introduction to modern convex geometry, in flavors of geometry , 1997 .

[29]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[30]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[31]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[32]  Andrew Chi-Chih Yao,et al.  Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).