Alternating Minimizations Converge to Second-Order Optimal Solutions

This work studies the second-order convergence for both standard alternating minimization and proximal alternating minimization. We show that under mild assumptions on the (nonconvex) objective function, both algorithms avoid strict saddles almost surely from random initialization. Together with known first-order convergence results, this implies that both algorithms converge to a second-order stationary point. This solves an open problem for the second-order convergence of alternating minimization algorithms that have been widely used in practice to solve large-scale nonconvex problems due to their simple implementation, fast convergence, and superb empirical performance.

[1]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[2]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[3]  Michael I. Jordan,et al.  First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.

[4]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[5]  Songtao Lu,et al.  PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization , 2019, ICML.

[6]  M. Shub Global Stability of Dynamical Systems , 1986 .

[7]  Zhihui Zhu,et al.  The Global Optimization Geometry of Low-Rank Matrix Optimization , 2017, IEEE Transactions on Information Theory.

[8]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[9]  J. V. Santen,et al.  How many parameters can a model have and still be testable , 1985 .

[10]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.

[11]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[12]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[13]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[14]  Prateek Jain,et al.  Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..

[15]  Junfeng Yang,et al.  A New Alternating Minimization Algorithm for Total Variation Image Reconstruction , 2008, SIAM J. Imaging Sci..

[16]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[17]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[18]  Klaus Fritzsche,et al.  From holomorphic functions to complex manifolds , 2002 .

[19]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.

[20]  Abhay Pasupathy,et al.  On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[22]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[23]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[24]  S. Ponomarev Submersions and preimages of sets of measure zero , 1987 .

[25]  Junwei Lu,et al.  Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.

[26]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[27]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[28]  Yudong Chen,et al.  Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation: Recent Theory and Fast Algorithms via Convex and Nonconvex Optimization , 2018, IEEE Signal Processing Magazine.

[29]  Qiuwei Li,et al.  The non-convex geometry of low-rank matrix optimization , 2016, Information and Inference: A Journal of the IMA.

[30]  P. Comon,et al.  Tensor decompositions, alternating least squares and other tales , 2009 .