Computing Matrix Squareroot via Non Convex Local Search

We consider the problem of computing the squareroot of a positive semidefinite (PSD) matrix. Several fast algorithms (some based on eigenvalue decomposition and some based on Taylor expansion) are known to solve this problem. In this paper, we propose another way to solve this problem: a natural algorithm performing gradient descent on a non-convex formulation of the matrix squareroot problem. We show that on an $n\times n$ input PSD matrix ${M}$, if the initial point is well conditioned, then the algorithm finds an $\epsilon$-accurate solution in $O\left(\kappa^{3/2} \log \frac{\left\|{M}\right\|_F}{\epsilon}\right)$ iterations, where $\kappa$ is the condition number of $M$. Each iteration involves three matrix multiplications (and does not use either matrix inversions or solutions of linear system), giving a total run time of $O\left(n^{\omega}\kappa^{3/2}\log\frac{\left\|{M}\right\|_F}{\epsilon}\right)$, where $\omega$ is the matrix multiplication exponent. Furthermore we show that our algorithm is robust to errors in each iteration. We also show a lower bound of $\Omega(\kappa)$ iterations for our algorithm demonstrating that the dependence of our result on $\kappa$ is necessary. Existing analyses of similar algorithms (e.g., Newton's method) require commutativity of the input matrix with each iterate of the algorithm which is ensured by choosing the starting iterate carefully. Our analysis, on the other hand, is much more general and does not require each iterate to commute with the input matrix. Consequently, our result guarantees convergence from a wide range of starting points. More generally, our result demonstrates that non-convex optimization can be a viable approach to obtaining fast and robust algorithms. Our argument is quite general and we believe it will find application in designing such algorithms for other problems in numerical linear algebra.

[1]  N. A. Carlson Federated square root filter for decentralized parallel processors , 1990 .

[2]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[3]  Nicholas J. Higham,et al.  Functions of matrices - theory and computation , 2008 .

[4]  Rudolph van der Merwe,et al.  The square-root unscented Kalman filter for state and parameter-estimation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[6]  Nicholas J. Higham,et al.  Stable iterations for the matrix square root , 1997, Numerical Algorithms.

[7]  J. Whitaker,et al.  Ensemble Square Root Filters , 2003, Statistical Methods for Climate Scientists.

[8]  BEATRICE MEINI,et al.  The Matrix Square Root from a New Functional Perspective: Theoretical Results and Computational Issues , 2005, SIAM J. Matrix Anal. Appl..

[9]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[10]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[11]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[12]  A. Bryson,et al.  Discrete square root filtering: A survey of current techniques , 1971 .

[13]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Nonconvex Factorization , 2015, FOCS.

[14]  Virginia Vassilevska Williams,et al.  Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[15]  Å. Björck,et al.  A Schur method for the square root of a matrix , 1983 .

[16]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[17]  N. Higham Newton's method for the matrix square root , 1986 .

[18]  Boris Polyak Gradient methods for the minimisation of functionals , 1963 .

[19]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[20]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[21]  N. Higham Computing real square roots of a real matrix , 1987 .