The Landscape of Matrix Factorization Revisited

We revisit the landscape of the simple matrix factorization problem. For low-rank matrix factorization, prior work has shown that there exist infinitely many critical points all of which are either global minima or strict saddles. At a strict saddle the minimum eigenvalue of the Hessian is negative. Of interest is whether this minimum eigenvalue is uniformly bounded below zero over all strict saddles. To answer this we consider orbits of critical points under the general linear group. For each orbit we identify a representative point, called a canonical point. If a canonical point is a strict saddle, so is every point on its orbit. We derive an expression for the minimum eigenvalue of the Hessian at each canonical strict saddle and use this to show that the minimum eigenvalue of the Hessian over the set of strict saddles is not uniformly bounded below zero. We also show that a known invariance property of gradient flow ensures the solution of gradient flow only encounters critical points on an invariant manifold $\mathcal{M}_C$ determined by the initial condition. We show that, in contrast to the general situation, the minimum eigenvalue of strict saddles in $\mathcal{M}_{0}$ is uniformly bounded below zero. We obtain an expression for this bound in terms of the singular values of the matrix being factorized. This bound depends on the size of the nonzero singular values and on the separation between distinct nonzero singular values of the matrix.

[1]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[2]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[5]  Junwei Lu,et al.  Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).

[6]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[7]  Matthias Hein,et al.  The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.

[8]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[9]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[10]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[11]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[12]  Tengyu Ma,et al.  Identity Matters in Deep Learning , 2016, ICLR.

[13]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[14]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[15]  Mihailo R. Jovanovic,et al.  On the stability of gradient flow dynamics for a rank-one matrix approximation problem , 2018, 2018 Annual American Control Conference (ACC).

[16]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[17]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[18]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[19]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[20]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[21]  John Wright,et al.  Complete dictionary recovery over the sphere , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).

[22]  Lin F. Yang,et al.  On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization , 2018, ArXiv.

[23]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[24]  Wei Hu,et al.  Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.

[25]  Nicolas Boumal,et al.  Nonconvex Phase Synchronization , 2016, SIAM J. Optim..

[26]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[27]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[28]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).