On the stability of gradient flow dynamics for a rank-one matrix approximation problem

In this paper, we examine the global stability of gradient flow dynamics associated with the problem of finding the best rank-one approximation of a given matrix. We partition the state-space into an infinite family of invariant manifolds over which the dynamics reduce to the special case of approximating a symmetric matrix. This allows us to employ a Lyapunov-based argument to explicitly characterize the region of attraction for the stable equilibrium points. This characterization proves an almost everywhere convergence for the gradient flow dynamics to the minimizers of the corresponding rank-one approximation problem.

[1]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[2]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[3]  Prateek Jain,et al.  Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, COLT.

[4]  Elad Hoffer,et al.  Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.

[5]  Mathukumalli Vidyasagar,et al.  Maximal lyapunov functions and domains of attraction for autonomous nonlinear systems , 1981, Autom..

[6]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[7]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[8]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[9]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[10]  Chongzhao Han,et al.  A Dual Purpose Principal and Minor Subspace Gradient Flow , 2012, IEEE Transactions on Signal Processing.

[11]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[12]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[13]  Iven M. Y. Mareels,et al.  A dual purpose principal and minor component flow , 2005, Syst. Control. Lett..

[14]  Haihao Lu,et al.  Depth Creates No Bad Local Minima , 2017, ArXiv.

[15]  H. Sahinoglou,et al.  On phase retrieval of finite-length sequences using the initial time sample , 1991 .

[16]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[17]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[18]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[19]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[20]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[21]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[24]  Jonathan H. Manton,et al.  A generalisation of the Oja subspace flow , 2006 .

[25]  U. Helmke,et al.  Optimization and Dynamical Systems , 1994, Proceedings of the IEEE.

[26]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[27]  Lijun Liu,et al.  Dual subspace learning via geodesic search on Stiefel manifold , 2014, Int. J. Mach. Learn. Cybern..

[28]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.