From the simplex to the sphere: Faster constrained optimization using the Hadamard parametrization

We show how to convert the problem of minimizing a convex function over the standard probability simplex to that of minimizing a nonconvex function over the unit sphere. We prove the landscape of this nonconvex problem is benign, i.e. every stationary point is either a strict saddle or a global minimizer. We exploit the Riemannian manifold structure of the sphere to propose several new algorithms for this problem. When used in conjunction with line search, our methods achieve a linear rate of convergence for non-degenerate interior points, both in theory and in practice. Extensive numerical experiments compare the performance of our proposed methods to existing methods, highlighting the strengths and weaknesses. We conclude with recommendations for practitioners.

[1]  Immanuel M. Bomze,et al.  Regularity versus Degeneracy in Dynamics, Games, and Optimization: A Unified Approach to Different Aspects , 2002, SIAM Rev..

[2]  A. Tsybakov,et al.  SPADES AND MIXTURE MODELS , 2009, 0901.2044.

[3]  Varun Kanade,et al.  Implicit Regularization for Optimal Sparse Recovery , 2019, NeurIPS.

[4]  Nicolas Boumal,et al.  Efficiently escaping saddle points on manifolds , 2019, NeurIPS.

[5]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[6]  Etienne de Klerk,et al.  The complexity of optimizing over a simplex, hypercube or sphere: a short survey , 2008, Central Eur. J. Oper. Res..

[7]  Steffen Limmer,et al.  A Neural Architecture for Bayesian Compressive Sensing Over the Simplex via Laplace Techniques , 2018, IEEE Transactions on Signal Processing.

[8]  P. Zhao,et al.  Implicit regularization via hadamard product over-parametrization in high-dimensional linear regression , 2019 .

[9]  Yunmei Chen,et al.  Projection Onto A Simplex , 2011, 1101.6081.

[10]  BoydStephen,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007 .

[11]  Arkadi Nemirovski,et al.  The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..

[12]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[13]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[14]  Yaguang Yang Globally Convergent Optimization Algorithms on Riemannian Manifolds: Uniform Framework for Unconstrained and Constrained Optimization , 2007 .

[15]  John Wright,et al.  Complete dictionary recovery over the sphere , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).

[16]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[17]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[18]  Martin Jaggi,et al.  Step-Size Adaptivity in Projection-Free Optimization , 2018 .

[19]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[20]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[21]  Michel Barlaud,et al.  A filtered bucket-clustering method for projection onto the simplex and the ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{doc , 2019, Mathematical Programming.

[22]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[23]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[24]  E. C. Zeeman,et al.  Population dynamics from game theory , 1980 .

[25]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[26]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[27]  Michael Biehl,et al.  Workshop New Challenges in Neural Computation , 2011 .

[28]  Shuzhong Zhang,et al.  A Cubic Regularized Newton's Method over Riemannian Manifolds , 2018, 1805.05565.

[29]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[30]  Nirmal Keshava,et al.  A Survey of Spectral Unmixing Algorithms , 2003 .

[31]  D. Luenberger The Gradient Projection Method Along Geodesics , 1972 .

[32]  J. Zico Kolter,et al.  A Continuous-Time View of Early Stopping for Least Squares Regression , 2018, AISTATS.

[33]  Yoram Singer,et al.  Efficient Learning of Label Ranking by Soft Projections onto Polyhedra , 2006, J. Mach. Learn. Res..

[34]  William W. Hager,et al.  A Nonmonotone Line Search Technique and Its Application to Unconstrained Optimization , 2004, SIAM J. Optim..

[35]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[36]  Laurent Condat Fast projection onto the simplex and the l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pmb {l}_\mathbf {1}$$\end{ , 2015, Mathematical Programming.