Nonconvex Matrix Factorization from Rank-One Measurements

We consider the problem of recovering low-rank matrices from random rank-one measurements, which spans numerous applications including covariance sketching, phase retrieval, quantum state tomography, and learning shallow polynomial neural networks, among others. Our approach is to directly estimate the low-rank factor by minimizing a nonconvex quadratic loss function via vanilla gradient descent, following a tailored spectral initialization. When the true rank is small, this algorithm is guaranteed to converge to the ground truth (up to global ambiguity) with near-optimal sample complexity and computational complexity. To the best of our knowledge, this is the first guarantee that achieves near-optimality in both metrics. In particular, the key enabler of near-optimal computational guarantees is an implicit regularization phenomenon: without explicit regularization, both spectral initialization and the gradient descent iterates automatically stay within a region incoherent with the measurement vectors. This feature allows one to employ much more aggressive step sizes compared with the ones suggested in prior literature, without the need of sample splitting.

[1]  Maxim Sviridenko,et al.  Concentration and moment inequalities for polynomials of independent random variables , 2012, SODA.

[2]  Yuejie Chi,et al.  Beyond Procrustes: Balancing-free Gradient Descent for Asymmetric Low-Rank Matrix Sensing , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[3]  Xiaodong Li,et al.  Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization , 2016, Applied and Computational Harmonic Analysis.

[4]  Arian Maleki,et al.  Optimization-Based AMP for Phase Retrieval: The Impact of Initialization and $\ell_{2}$ Regularization , 2018, IEEE Transactions on Information Theory.

[5]  H. Vincent Poor,et al.  Nonconvex Low-Rank Tensor Completion from Noisy Data , 2019, NeurIPS.

[6]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[7]  Ji Chen,et al.  Nonconvex Rectangular Matrix Completion via Gradient Descent Without ℓ₂,∞ Regularization , 2020, IEEE Transactions on Information Theory.

[8]  Alexandre d'Aspremont,et al.  Phase recovery, MaxCut and complex semidefinite programming , 2012, Math. Program..

[9]  Yuxin Chen,et al.  Convex and Nonconvex Optimization Are Both Minimax-Optimal for Noisy Blind Deconvolution Under Random Designs , 2020, Journal of the American Statistical Association.

[10]  Jieping Ye,et al.  A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing , 2016, NIPS.

[11]  Holger Rauhut,et al.  Low rank matrix recovery from rank one measurements , 2014, ArXiv.

[12]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[13]  Yuejie Chi,et al.  Low-Rank Matrix Recovery with Scaled Subgradient Methods: Fast and Robust Convergence Without the Condition Number , 2020, ArXiv.

[14]  C. Sinan Güntürk,et al.  Convergence of the randomized Kaczmarz method for phase retrieval , 2017, ArXiv.

[15]  Yuxin Chen,et al.  Spectral Method and Regularized MLE Are Both Optimal for Top-$K$ Ranking , 2017, Annals of statistics.

[16]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[17]  Yuxin Chen,et al.  Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..

[18]  Yue Sun,et al.  Low-Rank Positive Semidefinite Matrix Recovery From Corrupted Rank-One Measurements , 2016, IEEE Transactions on Signal Processing.

[19]  Xiaodong Li,et al.  Solving Quadratic Equations via PhaseLift When There Are About as Many Equations as Unknowns , 2012, Found. Comput. Math..

[20]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[21]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[22]  Yuling Yan,et al.  Noisy Matrix Completion: Understanding Statistical Guarantees for Convex Relaxation via Nonconvex Optimization , 2019, SIAM J. Optim..

[23]  Inderjit S. Dhillon,et al.  Efficient Matrix Sensing Using Rank-1 Gaussian Measurements , 2015, ALT.

[24]  Adel Javanmard,et al.  Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.

[25]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[26]  Noureddine El Karoui,et al.  On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[27]  Yingbin Liang,et al.  Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy , 2018, IEEE Transactions on Signal Processing.

[28]  Yuejie Chi,et al.  Kaczmarz Method for Solving Quadratic Equations , 2016, IEEE Signal Processing Letters.

[29]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[30]  Andrea J. Goldsmith,et al.  Exact and Stable Covariance Estimation From Quadratic Sampling via Convex Programming , 2013, IEEE Transactions on Information Theory.

[31]  V. Koltchinskii,et al.  High Dimensional Probability , 2006, math/0612726.

[32]  Yudong Chen,et al.  Leave-One-Out Approach for Matrix Completion: Primal and Dual Analysis , 2018, IEEE Transactions on Information Theory.

[33]  Yuxin Chen,et al.  Nonconvex Matrix Factorization From Rank-One Measurements , 2018, IEEE Transactions on Information Theory.

[34]  Yingbin Liang,et al.  A Nonconvex Approach for Phase Retrieval: Reshaped Wirtinger Flow and Incremental Algorithms , 2017, J. Mach. Learn. Res..

[35]  Noureddine El Karoui On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[36]  Yingbin Liang,et al.  Median-Truncated Nonconvex Approach for Phase Retrieval With Outliers , 2016, IEEE Transactions on Information Theory.

[37]  Yonina C. Eldar,et al.  Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow , 2016, IEEE Transactions on Information Theory.

[38]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[39]  Anru Zhang,et al.  ROP: Matrix Recovery via Rank-One Projections , 2013, ArXiv.

[40]  Xiaodong Li,et al.  Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow , 2015, ArXiv.

[41]  P. Bickel,et al.  On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[42]  L. Tian,et al.  Experimental compressive phase space tomography , 2011, Optics express.

[43]  Yuanming Shi,et al.  Nonconvex Demixing From Bilinear Measurements , 2018, IEEE Transactions on Signal Processing.

[44]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[45]  Mahdi Soltanolkotabi,et al.  Structured Signal Recovery From Quadratic Measurements: Breaking Sample Complexity Barriers via Nonconvex Optimization , 2017, IEEE Transactions on Information Theory.

[46]  Yuling Yan,et al.  Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data , 2020, Annals of statistics.

[47]  V. Bentkus An Inequality for Tail Probabilities of Martingales with Differences Bounded from One Side , 2003 .

[48]  Chinmay Hegde,et al.  Towards Provable Learning of Polynomial Neural Networks Using Low-Rank Matrix Estimation , 2018, AISTATS.

[49]  Emmanuel J. Candès,et al.  PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , 2011, ArXiv.

[50]  H. Vincent Poor,et al.  Subspace Estimation from Unbalanced and Incomplete Data Matrices: 𝓁2, ∞ Statistical Guarantees , 2021, ArXiv.

[51]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[52]  Yan Shuo Tan,et al.  Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees , 2017, ArXiv.

[53]  Jason D. Lee,et al.  On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.

[54]  Yuantao Gu,et al.  Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.

[55]  L. Demanet,et al.  Stable Optimizationless Recovery from Phaseless Linear Measurements , 2012, Journal of Fourier Analysis and Applications.

[56]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[57]  J. Berge,et al.  Orthogonal procrustes rotation for two or more matrices , 1977 .

[58]  Yuejie Chi,et al.  Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent , 2020, J. Mach. Learn. Res..

[59]  Nicolas Boumal,et al.  Near-Optimal Bounds for Phase Synchronization , 2017, SIAM J. Optim..

[60]  Yuejie Chi,et al.  Spectral Methods for Data Science: A Statistical Perspective , 2021, Found. Trends Mach. Learn..

[61]  Yuling Yan,et al.  Inference and uncertainty quantification for noisy matrix completion , 2019, Proceedings of the National Academy of Sciences.

[62]  Sujay Sanghavi,et al.  The Local Convexity of Solving Systems of Quadratic Equations , 2015, 1506.07868.

[63]  A. Mukherjea,et al.  Real and Functional Analysis , 1978 .

[64]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[65]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[66]  Jian-Feng Cai,et al.  Spectral Compressed Sensing via Projected Gradient Descent , 2017, SIAM J. Optim..

[67]  Constantine Caramanis,et al.  A Convex Formulation for Mixed Regression: Near Optimal Rates in the Face of Noise , 2013, ArXiv.

[68]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .