Towards Provable Learning of Polynomial Neural Networks Using Low-Rank Matrix Estimation

We study the problem of (provably) learning the weights of a two-layer neural network with quadratic activations. In particular, we focus on the under-parametrized regime where the number of neurons in the hidden layer is (much) smaller than the dimension of the input. Our approach uses a lifting trick, which enables us to borrow algorithmic ideas from low-rank matrix estimation. In this context, we propose two novel, nonconvex training algorithms which do not need any extra tuning parameters other than the number of hidden neurons. We support our algorithms with rigorous theoretical analysis, and show that the proposed algorithms enjoy linear convergence, fast running time per iteration, and near-optimal sample complexity. Finally, we complement our theoretical results with several numerical experiments.

[1]  Justin K. Romberg,et al.  An Overview of Low-Rank Matrix Recovery From Incomplete Observations , 2016, IEEE Journal of Selected Topics in Signal Processing.

[2]  Yi-Kai Liu,et al.  Universal low-rank matrix recovery from Pauli measurements , 2011, NIPS.

[3]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[4]  Piotr Indyk,et al.  Approximation Algorithms for Model-Based Compressive Sensing , 2014, IEEE Transactions on Information Theory.

[5]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[6]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[7]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[8]  Andrea J. Goldsmith,et al.  Exact and Stable Covariance Estimation From Quadratic Sampling via Convex Programming , 2013, IEEE Transactions on Information Theory.

[9]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[10]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[11]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[12]  Anru Zhang,et al.  ROP: Matrix Recovery via Rank-One Projections , 2013, ArXiv.

[13]  Jieping Ye,et al.  The Second Order Linear Model , 2017, 1703.00598.

[14]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[15]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[16]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[17]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[18]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[19]  Richard G. Baraniuk,et al.  A Field Guide to Forward-Backward Splitting with a FASTA Implementation , 2014, ArXiv.

[20]  Jieping Ye,et al.  A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing , 2016, NIPS.

[21]  Richard G. Baraniuk,et al.  FASTA: A Generalized Implementation of Forward-Backward Splitting , 2015, ArXiv.

[22]  Ohad Shamir,et al.  Large-Scale Convex Minimization with a Low-Rank Constraint , 2011, ICML.

[23]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[24]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[25]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[26]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[27]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[28]  Piotr Indyk,et al.  Fast recovery from a union of subspaces , 2016, NIPS.

[29]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[30]  Adel Javanmard,et al.  Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.

[31]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..

[32]  Emmanuel J. Candès,et al.  PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , 2011, ArXiv.

[33]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[34]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[35]  Parikshit Shah,et al.  Sketching Sparse Matrices, Covariances, and Graphs via Tensor Products , 2015, IEEE Transactions on Information Theory.

[36]  Holger Rauhut,et al.  Low rank matrix recovery from rank one measurements , 2014, ArXiv.

[37]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[38]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[39]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[40]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[41]  Inderjit S. Dhillon,et al.  Efficient Matrix Sensing Using Rank-1 Gaussian Measurements , 2015, ALT.

[42]  Cameron Musco,et al.  Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition , 2015, NIPS.

[43]  Yuandong Tian,et al.  Symmetry-Breaking Convergence Analysis of Certain Two-layered Neural Networks with ReLU nonlinearity , 2017, ICLR.