A Non-Euclidean Gradient Descent Framework for Non-Convex Matrix Factorization

We study convex optimization problems that feature low-rank matrix solutions. In such scenarios, non-convex methods offer significant advantages over convex methods due to their lower space complexity, as well as practical faster convergence. Under mild assumptions, these methods feature global convergence guarantees. In this paper, we extend the results on this matter by following a different path. We derive a non-Euclidean optimization framework in the non-convex setting that takes nonlinear gradient steps on the factors. Our framework enables the possibility to further exploit the underlying problem structures, such as sparsity or low-rankness on the factorized domain, or better dimensional dependence of the smoothness parameters of the objectives. We prove that the non-Euclidean methods enjoy the same rigorous guarantees as their Euclidean counterparts under appropriate assumptions. Numerical evidence with Fourier ptychography and FastText applications, using real data, shows that our approach can enhance solution quality, as well as convergence speed over the standard non-convex approaches.

[1]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[2]  J. Horváth Locally convex spaces , 1973 .

[3]  J R Fienup,et al.  Phase retrieval algorithms: a comparison. , 1982, Applied optics.

[4]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5]  I. Ciorǎnescu Geometry of banach spaces, duality mappings, and nonlinear problems , 1990 .

[6]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[7]  Stephen P. Boyd,et al.  Rank minimization and applications in system theory , 2004, Proceedings of the 2004 American Control Conference.

[8]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[9]  Renato D. C. Monteiro,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[10]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[11]  Kim-Chuan Toh,et al.  Semidefinite Programming Approaches for Sensor Network Localization With Noisy Distance Measurements , 2006, IEEE Transactions on Automation Science and Engineering.

[12]  Scott Aaronson,et al.  The learnability of quantum states , 2006, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[13]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[14]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[15]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[16]  Stephen Becker,et al.  Quantum state tomography via compressed sensing. , 2009, Physical review letters.

[17]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[18]  Emmanuel J. Candès,et al.  PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , 2011, ArXiv.

[19]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[20]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[21]  R. Horstmeyer,et al.  Wide-field, high-resolution Fourier ptychographic microscopy , 2013, Nature Photonics.

[22]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[23]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[24]  Nagarajan Natarajan,et al.  Prediction and clustering in signed networks: a local to global perspective , 2013, J. Mach. Learn. Res..

[25]  Yin Tat Lee,et al.  An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations , 2013, SODA.

[26]  Mary Wootters,et al.  Fast matrix completion without the condition number , 2014, COLT.

[27]  Volkan Cevher,et al.  Scalable Sparse Covariance Estimation via Self-Concordance , 2014, AAAI.

[28]  Volkan Cevher,et al.  Stochastic Spectral Descent for Restricted Boltzmann Machines , 2015, AISTATS.

[29]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[30]  Alexandre d'Aspremont,et al.  Phase recovery, MaxCut and complex semidefinite programming , 2012, Math. Program..

[31]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[32]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[33]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[34]  Brendan Ames,et al.  Solving ptychography with a convex relaxation , 2014, New journal of physics.

[35]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[36]  Volkan Cevher,et al.  Preconditioned Spectral Descent for Deep Learning , 2015, NIPS.

[37]  Qionghai Dai,et al.  Fourier ptychographic reconstruction using Wirtinger flow optimization. , 2014, Optics express.

[38]  Suvrit Sra,et al.  Conic Geometric Optimization on the Manifold of Positive Definite Matrices , 2013, SIAM J. Optim..

[39]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[40]  Volkan Cevher,et al.  Stochastic Spectral Descent for Discrete Graphical Models , 2016, IEEE Journal of Selected Topics in Signal Processing.

[41]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[42]  Yingbin Liang,et al.  Provable Non-convex Phase Retrieval with Outliers: Median TruncatedWirtinger Flow , 2016, ICML.

[43]  Nicolas Boumal,et al.  The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.

[44]  Ayfer Özgür,et al.  Phase Retrieval via Incremental Truncated Wirtinger Flow , 2016, ArXiv.

[45]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[46]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[47]  Chunyan Miao,et al.  Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction , 2016, PLoS Comput. Biol..

[48]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[49]  Justin Romberg,et al.  Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation , 2016, AISTATS.

[50]  Richard G. Baraniuk,et al.  Coherent inverse scattering via transmission matrices: Efficient phase retrieval algorithms and a public dataset , 2017, 2017 IEEE International Conference on Computational Photography (ICCP).

[51]  Ziyang Yuan,et al.  Phase Retrieval Via Reweighted Wirtinger Flow , 2017, Applied optics.

[52]  Volkan Cevher,et al.  Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage , 2017, AISTATS.

[53]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[54]  Tom Goldstein,et al.  PhasePack: A phase retrieval library , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[55]  Christos Thrampoulidis,et al.  Phase retrieval via linear programming: Fundamental limits and algorithmic improvements , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[56]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[57]  Anastasios Kyrillidis,et al.  Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably , 2016, SIAM J. Imaging Sci..

[58]  Gang Wang,et al.  Sparse Phase Retrieval via Truncated Amplitude Flow , 2016, IEEE Transactions on Signal Processing.

[59]  K. Kreutz-Delgado,et al.  - Finite-Dimensional Vector Spaces , 2018, Physical Components of Tensors.

[60]  Yonina C. Eldar,et al.  Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow , 2016, IEEE Transactions on Information Theory.

[61]  Tom Goldstein,et al.  PhaseMax: Convex Phase Retrieval via Basis Pursuit , 2016, IEEE Transactions on Information Theory.

[62]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[63]  P. Absil,et al.  Erratum to: ``Global rates of convergence for nonconvex optimization on manifolds'' , 2016, IMA Journal of Numerical Analysis.