From Symmetry to Geometry: Tractable Nonconvex Problems

As science and engineering have become increasingly data-driven, the role of optimization has expanded to touch almost every stage of the data analysis pipeline, from the signal and data acquisition to modeling and prediction. The optimization problems encountered in practice are often nonconvex. While challenges vary from problem to problem, one common source of nonconvexity is nonlinearity in the data or measurement model. Nonlinear models often exhibit symmetries, creating complicated, nonconvex objective landscapes, with multiple equivalent solutions. Nevertheless, simple methods (e.g., gradient descent) often perform surprisingly well in practice. The goal of this survey is to highlight a class of tractable nonconvex problems, which can be understood through the lens of symmetries. These problems exhibit a characteristic geometric structure: local minimizers are symmetric copies of a single ``ground truth'' solution, while other critical points occur at balanced superpositions of symmetric copies of the ground truth, and exhibit negative curvature in directions that break the symmetry. This structure enables efficient methods to obtain global minimizers. We discuss examples of this phenomenon arising from a wide range of problems in imaging, signal processing, and data analysis. We highlight the key role of symmetry in shaping the objective landscape and discuss the different roles of rotational and discrete symmetries. This area is rich with observed phenomena and open problems; we close by highlighting directions for future research.

[1]  J. Corbett The pauli problem, state reconstruction and quantum-real numbers , 2006 .

[2]  I JordanMichael,et al.  First-order methods almost always avoid strict saddle points , 2019 .

[3]  O. Bunk,et al.  Diffractive imaging for periodic samples: retrieving one-dimensional concentration profiles across microfluidic channels. , 2007, Acta crystallographica. Section A, Foundations of crystallography.

[4]  John Wright,et al.  Geometry and Symmetry in Short-and-Sparse Deconvolution , 2019, ICML.

[5]  Huan Wang,et al.  Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[6]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.

[7]  David Tse,et al.  Porcupine Neural Networks: (Almost) All Local Optima are Global , 2017, ArXiv.

[8]  Yanjun Li,et al.  Identifiability in Blind Deconvolution With Subspace or Sparsity Constraints , 2015, IEEE Transactions on Information Theory.

[9]  P. Thibault X-ray ptychography , 2011 .

[10]  John Wright,et al.  Structured Local Optima in Sparse Blind Deconvolution , 2018, IEEE Transactions on Information Theory.

[11]  Michael I. Jordan,et al.  Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[12]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[13]  Jun Zhang,et al.  Complete Dictionary Learning via $\ell_p$-norm Maximization , 2020, UAI 2020.

[14]  Anima Anandkumar,et al.  Spectral Learning on Matrices and Tensors , 2019, Found. Trends Mach. Learn..

[15]  Tibor Csendes,et al.  Nonlinear coordinate transformations for unconstrained optimization II. Theoretical background , 1993, J. Glob. Optim..

[16]  Pengcheng Zhou,et al.  Short-and-Sparse Deconvolution - A Geometric Approach , 2019, ICLR.

[17]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[18]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[19]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[20]  Michael I. Jordan,et al.  First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.

[21]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[22]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[23]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[24]  R. Balan,et al.  On signal reconstruction without phase , 2006 .

[25]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[26]  Nicolas Boumal,et al.  Nonconvex Phase Synchronization , 2016, SIAM J. Optim..

[27]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[28]  Anthony Man-Cho So,et al.  Theory of semidefinite programming for Sensor Network Localization , 2005, SODA '05.

[29]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[30]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[31]  John Wright,et al.  Efficient Dictionary Learning with Gradient Descent , 2018, ICML.

[32]  A. Walther The Question of Phase Retrieval in Optics , 1963 .

[33]  Yanjun Li,et al.  Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere , 2018, NeurIPS.

[34]  Yongtian Wang,et al.  Robust Photometric Stereo via Low-Rank Matrix Completion and Recovery , 2010, ACCV.

[35]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[36]  Pramod Viswanath,et al.  Learning One-hidden-layer Neural Networks under General Input Distributions , 2018, AISTATS.

[37]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[38]  Jun Zhang,et al.  Complete Dictionary Learning via ℓp-norm Maximization , 2020, UAI.

[39]  Anastasios Kyrillidis,et al.  Provable compressed sensing quantum state tomography via non-convex methods , 2018, npj Quantum Information.

[40]  Daniel P. Robinson,et al.  Dual Principal Component Pursuit: Improved Analysis and Efficient Algorithms , 2018, NeurIPS.

[41]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[42]  Andrea Montanari,et al.  On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition , 2018, AISTATS.

[43]  René Vidal,et al.  Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Junwei Lu,et al.  Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).

[45]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[46]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[47]  Yurii Nesterov,et al.  Squared Functional Systems and Optimization Problems , 2000 .

[48]  R. Bishop,et al.  Manifolds of negative curvature , 1969 .

[49]  E. Candès,et al.  Sparsity and incoherence in compressive sampling , 2006, math/0611957.

[50]  Yonina C. Eldar,et al.  Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow , 2016, IEEE Transactions on Information Theory.

[51]  Afonso S. Bandeira,et al.  On the Landscape of Synchronization Networks: A Perspective from Nonconvex Optimization , 2018, SIAM J. Optim..

[52]  Hongyu Zhao,et al.  Low-Rank Modeling and Its Applications in Image Analysis , 2014, ACM Comput. Surv..

[53]  Maryam Fazel,et al.  Escaping from saddle points on Riemannian manifolds , 2019, NeurIPS.

[54]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[55]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[56]  J. Miao,et al.  High resolution 3D x-ray diffraction microscopy. , 2002, Physical review letters.

[57]  Yonina C. Eldar,et al.  Convolutional Phase Retrieval , 2017, NIPS.

[58]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[59]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[60]  Fabio L. Traversa,et al.  Taming a nonconvex landscape with dynamical long-range order: Memcomputing Ising benchmarks. , 2018, Physical review. E.

[61]  Adel Javanmard,et al.  Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.

[62]  Yonina C. Eldar,et al.  Phase Retrieval with Application to Optical Imaging: A contemporary overview , 2015, IEEE Signal Processing Magazine.

[63]  Joan Bruna,et al.  Mathematics of Deep Learning , 2017, ArXiv.

[64]  Yuejie Chi,et al.  Manifold Gradient Descent Solves Multi-Channel Sparse Blind Deconvolution Provably and Efficiently , 2019, IEEE Transactions on Information Theory.

[65]  Fionn Murtagh,et al.  Deconvolution in Astronomy: A Review , 2002 .

[66]  Tengyu Ma,et al.  On the optimization landscape of tensor decompositions , 2017, Mathematical Programming.

[67]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[68]  Dmitriy Drusvyatskiy,et al.  Uniform Graphical Convergence of Subgradients in Nonconvex Optimization and Learning , 2018, Math. Oper. Res..

[69]  Zhihui Zhu,et al.  Analysis of the Optimization Landscapes for Overcomplete Representation Learning , 2019, ArXiv.

[70]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[71]  Nicolas Boumal,et al.  On the low-rank approach for semidefinite programs arising in synchronization and community detection , 2016, COLT.

[72]  Brendt Wohlberg,et al.  Convolutional Dictionary Learning: A Comparative Review and New Algorithms , 2017, IEEE Transactions on Computational Imaging.

[73]  Jiming Peng,et al.  Advanced Optimization Laboratory Title : Approximating K-means-type clustering via semidefinite programming , 2005 .

[74]  Laura Waller,et al.  Physics-Based Learned Design: Optimized Coded-Illumination for Quantitative Phase Imaging , 2018, IEEE Transactions on Computational Imaging.

[75]  David Steurer,et al.  Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[76]  G. B. Arous,et al.  The Landscape of the Spiked Tensor Model , 2017, Communications on Pure and Applied Mathematics.

[77]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[78]  Damek Davis,et al.  The nonsmooth landscape of phase retrieval , 2017, IMA Journal of Numerical Analysis.

[79]  Laura Waller,et al.  Experimental robustness of Fourier Ptychography phase retrieval algorithms , 2015, Optics express.

[80]  Yinyu Ye,et al.  Semidefinite programming based algorithms for sensor network localization , 2006, TOSN.

[81]  Wei Qian,et al.  Structures of Spurious Local Minima in k-Means , 2020, IEEE Transactions on Information Theory.

[82]  L. Tian,et al.  3D intensity and phase imaging from light field measurements in an LED array microscope , 2015 .

[83]  Rick P. Millane,et al.  Phase retrieval in crystallography and optics , 1990 .

[84]  Yi Ma,et al.  Complete Dictionary Learning via 𝓁4-Norm Maximization over the Orthogonal Group , 2019, J. Mach. Learn. Res..

[85]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[86]  John Wright,et al.  On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  S. Yau Non-existence of continuous convex functions on certain Riemannian manifolds , 1974 .

[88]  Ruoyu Sun,et al.  Optimization for deep learning: theory and algorithms , 2019, ArXiv.

[89]  Prateek Jain,et al.  Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..

[90]  Donald Goldfarb,et al.  Curvilinear path steplength algorithms for minimization which use directions of negative curvature , 1980, Math. Program..

[91]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[92]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[93]  Thomas Strohmer,et al.  Blind Deconvolution Meets Blind Demixing: Algorithms and Performance Bounds , 2015, IEEE Transactions on Information Theory.

[94]  Yudong Chen,et al.  Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities , 2019, NeurIPS.

[95]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[96]  Emmanuel J. Candès,et al.  PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , 2011, ArXiv.

[97]  Christos Tzamos,et al.  Ten Steps of EM Suffice for Mixtures of Two Gaussians , 2016, COLT.

[98]  Yuling Yan,et al.  Efficient Clustering for Stretched Mixtures: Landscape and Optimality , 2020, NeurIPS.

[99]  Zhihui Zhu,et al.  Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications , 2020, ArXiv.

[100]  Andrea Montanari,et al.  Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality , 2017, COLT.

[101]  Hassan Mansour,et al.  Efficient matrix completion for seismic data reconstruction , 2015 .

[102]  Nicolas Boumal,et al.  Efficiently escaping saddle points on manifolds , 2019, NeurIPS.

[103]  Xiao Li,et al.  Nonconvex Robust Low-rank Matrix Recovery , 2018, SIAM J. Optim..

[104]  Ju Sun,et al.  Dictionary learning in Fourier-transform scanning tunneling spectroscopy , 2018, Nature Communications.

[105]  T. Heinosaari,et al.  Quantum Tomography under Prior Information , 2011, 1109.5478.

[106]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[107]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[108]  Martin J. Wainwright,et al.  Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences , 2016, NIPS.

[109]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[110]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[111]  J. Franklin,et al.  Ambiguities in the X‐ray analysis of crystal structures , 1974 .

[112]  Joseph F. Murray,et al.  Learning Sparse Overcomplete Codes for Images , 2006, J. VLSI Signal Process..

[113]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[114]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[115]  A. L. Patterson A Fourier Series Method for the Determination of the Components of Interatomic Distances in Crystals , 1934 .

[116]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[117]  Yanjun Li,et al.  Multichannel Sparse Blind Deconvolution on the Sphere , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[118]  Feng Ruan,et al.  Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval , 2017, Information and Inference: A Journal of the IMA.

[119]  G. Papanicolaou,et al.  Array imaging using intensity-only measurements , 2010 .

[120]  Tselil Schramm,et al.  Speeding up sum-of-squares for tensor decomposition and planted sparse vectors , 2015, ArXiv.

[121]  John Wright,et al.  Finding a Sparse Vector in a Subspace: Linear Sparsity Using Alternating Directions , 2014, IEEE Transactions on Information Theory.

[122]  Zhihui Zhu,et al.  A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution , 2019, NeurIPS.

[123]  Edmund Taylor Whittaker Philosophic Foundations of Quantum Mechanics , 1946, Nature.

[124]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[125]  Yonina C. Eldar,et al.  STFT Phase Retrieval: Uniqueness Guarantees and Recovery Algorithms , 2015, IEEE Journal of Selected Topics in Signal Processing.

[126]  Dmitriy Drusvyatskiy,et al.  Subgradient Methods for Sharp Weakly Convex Functions , 2018, Journal of Optimization Theory and Applications.

[127]  Arian Maleki,et al.  Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.

[128]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[129]  David Pfau,et al.  Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging Data , 2016, Neuron.

[130]  R. Bott Lectures on Morse theory, old and new , 1982 .

[131]  Prabhat Hajela,et al.  Genetic search - An approach to the nonconvex optimization problem , 1989 .

[132]  Ohad Shamir,et al.  Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.

[133]  Yuxin Chen,et al.  Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..

[134]  Xiaodong Li,et al.  Phase Retrieval from Coded Diffraction Patterns , 2013, 1310.3240.

[135]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[136]  Justin K. Romberg,et al.  Blind Deconvolution Using Convex Programming , 2012, IEEE Transactions on Information Theory.

[137]  René Vidal,et al.  Dual Principal Component Pursuit , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[138]  John Wright,et al.  RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[139]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[140]  Yu Bai,et al.  Subgradient Descent Learns Orthogonal Dictionaries , 2018, ICLR.

[141]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[142]  Dmitriy Drusvyatskiy,et al.  Composite optimization for robust blind deconvolution , 2019, ArXiv.

[143]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[144]  James R Fienup,et al.  Phase retrieval algorithms: a personal tour [Invited]. , 2013, Applied optics.

[145]  Gilad Lerman,et al.  An Overview of Robust Subspace Recovery , 2018, Proceedings of the IEEE.

[146]  Yonina C. Eldar,et al.  Phase Retrieval: An Overview of Recent Developments , 2015, ArXiv.

[147]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[148]  Robert W. Harrison,et al.  Phase problem in crystallography , 1993 .

[149]  S. Osher,et al.  Seismic data reconstruction via matrix completion , 2013 .

[150]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[151]  Constantine Caramanis,et al.  Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression , 2018, ArXiv.

[152]  Anthony Man-Cho So,et al.  Nonsmooth Optimization over Stiefel Manifold: Riemannian Subgradient Methods , 2019, ArXiv.

[153]  Dmitriy Drusvyatskiy,et al.  Low-Rank Matrix Recovery with Composite Optimization: Good Conditioning and Rapid Convergence , 2019, Found. Comput. Math..

[154]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[155]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[156]  Radu V. Balan,et al.  On signal reconstruction from its spectrogram , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[157]  Qiuwei Li,et al.  The non-convex geometry of low-rank matrix optimization , 2016, Information and Inference: A Journal of the IMA.

[158]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[159]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[160]  Thomas Strohmer,et al.  The numerics of phase retrieval , 2020, Acta Numerica.

[161]  Nicolas Boumal,et al.  Near-Optimal Bounds for Phase Synchronization , 2017, SIAM J. Optim..

[162]  Z. Wen,et al.  A Brief Introduction to Manifold Optimization , 2019, Journal of the Operations Research Society of China.

[163]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[164]  Yonina C. Eldar,et al.  Fourier Phase Retrieval: Uniqueness and Algorithms , 2017, ArXiv.

[165]  Adrian S. Lewis,et al.  A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization , 2005, SIAM J. Optim..

[166]  Michael I. Jordan,et al.  First-order methods almost always avoid strict saddle points , 2019, Mathematical Programming.

[167]  Jason D. Lee,et al.  When Does Non-Orthogonal Tensor Decomposition Have No Spurious Local Minima? , 2019, ArXiv.

[168]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[169]  Alexandre d'Aspremont,et al.  Phase recovery, MaxCut and complex semidefinite programming , 2012, Math. Program..

[170]  Justin K. Romberg,et al.  An Overview of Low-Rank Matrix Recovery From Incomplete Observations , 2016, IEEE Journal of Selected Topics in Signal Processing.

[171]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[172]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .