Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation

It has been observed in a variety of contexts that gradient descent methods have great success in solving low-rank matrix factorization problems, despite the relevant problem formulation being non-convex. We tackle a particular instance of this scenario, where we seek the $d$-dimensional subspace spanned by a streaming data matrix. We apply the natural first order incremental gradient descent method, constraining the gradient method to the Grassmannian. In this paper, we propose an adaptive step size scheme that is greedy for the noiseless case, that maximizes the improvement of our metric of convergence at each data index $t$, and yields an expected improvement for the noisy case. We show that, with noise-free data, this method converges from any random initialization to the global minimum of the problem. For noisy data, we provide the expected convergence rate of the proposed algorithm per iteration.

[1]  Gene H. Golub,et al.  Matrix computations , 1983 .

[2]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[3]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[4]  P. Koev,et al.  On the largest principal angle between random subspaces , 2006 .

[5]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[6]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[7]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[8]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[9]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[10]  S. Meyn,et al.  2010 48th Annual Allerton Conference on Communication, Control, and Computing : September 29-October 1, 2010 , 2010 .

[11]  Robert D. Nowak,et al.  Online identification and tracking of subspaces from highly incomplete information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[13]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[14]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[15]  Robert Nowak,et al.  Handling missing data in high-dimensional subspace modeling , 2012 .

[16]  Laura Balzano,et al.  Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Raghunandan H. Keshavan Efficient algorithms for collaborative filtering , 2012 .

[18]  Yousef Saad,et al.  Scaled Gradients on Grassmann Manifolds for Matrix Completion , 2012, NIPS.

[19]  José H. Dulá,et al.  A pure L1L1-norm principal component analysis , 2013, Comput. Stat. Data Anal..

[20]  Sanjoy Dasgupta,et al.  The Fast Convergence of Incremental PCA , 2013, NIPS.

[21]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[22]  Nathan Srebro,et al.  Stochastic Optimization of PCA with Capped MSG , 2013, NIPS.

[23]  J. Brooks,et al.  A Pure L1-norm Principal Component Analysis. , 2013, Computational statistics & data analysis.

[24]  M. Shub,et al.  Average polynomial time for eigenvector computations , 2014, 1410.2179.

[25]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[26]  V. Vu,et al.  Random matrices: Law of the determinant , 2011, 1112.0752.

[27]  Stephen J. Wright,et al.  Local Convergence of an Algorithm for Subspace Identification from Partial Data , 2013, Found. Comput. Math..

[28]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[29]  Christopher De Sa,et al.  Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems , 2014, ICML.

[30]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[31]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.