Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

Let <tex>$V$</tex> be any vector space of multivariate degree-<tex>$d$</tex> homogeneous polynomials with co-dimension at most <tex>$k$</tex>, and <tex>$S$</tex> be the set of points where all polynomials in <tex>$V$</tex> nearly vanish. We establish a qualitatively optimal upper bound on the size of <tex>$\epsilon$</tex>-covers for <tex>$S$</tex>, in the <tex>$\ell_{2}$</tex>-norm. Roughly speaking, we show that there exists an <tex>$\epsilon$</tex>-cover for <tex>$S$</tex> of cardinality <tex>$M=(k/\epsilon)^{O_{d}(k^{1/d})}$</tex>. Our result is constructive yielding an algorithm to compute such an <tex>$\epsilon$</tex>-cover that runs in time <tex>$\text{poly}(M)$</tex>. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for <tex>$k$</tex>-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with <tex>$k$</tex> hidden units (under the Gaussian distribution), density and parameter estimation for <tex>$k$</tex>-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for <tex>$k$</tex>-mixtures of hyperplanes. Our algorithms run in time quasi-polynomial in the parameter <tex>$k$</tex>. Previous algorithms for these problems had running times exponential in <tex>$k^{\Omega(1)}$</tex>. At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.

[1]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[2]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[3]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[4]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[5]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[6]  Rocco A. Servedio,et al.  Learning mixtures of structured distributions over discrete domains , 2012, SODA.

[7]  Moritz Hardt,et al.  Tight Bounds for Learning a Mixture of Two Gaussians , 2014, STOC.

[8]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[9]  Inderjit S. Dhillon,et al.  Mixed Linear Regression with Multiple Components , 2016, NIPS.

[10]  Qingqing Huang,et al.  Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[11]  Aditya Bhaskara,et al.  Sparse Solutions to Nonnegative Linear Systems and Applications , 2015, AISTATS.

[12]  Zhao Song,et al.  Learning mixtures of linear regressions in subexponential time via Fourier moments , 2019, STOC.

[13]  Alon Orlitsky,et al.  Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[14]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[15]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[16]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[17]  John Wilmes,et al.  Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds , 2018, COLT.

[18]  Pravesh Kothari,et al.  Outlier-robust moment-estimation via sum-of-squares , 2017, ArXiv.

[19]  ScienceDirect Computational statistics & data analysis , 1983 .

[20]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[21]  Ilias Diakonikolas,et al.  Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[22]  Yoram Singer,et al.  Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.

[23]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[24]  Zhize Li,et al.  Learning Two-layer Neural Networks with Symmetric Inputs , 2018, ICLR.

[25]  Anima Anandkumar,et al.  Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.

[26]  Mikhail Belkin,et al.  The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[27]  Jerry Li,et al.  Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[28]  Yuchen Zhang,et al.  L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.

[29]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[30]  Adam R. Klivans,et al.  Learning Neural Networks with Two Nonlinear Layers in Polynomial Time , 2017, COLT.

[31]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[32]  Jerry Li,et al.  Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities , 2017, COLT.

[33]  L. Gross LOGARITHMIC SOBOLEV INEQUALITIES. , 1975 .

[34]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[35]  Yuanzhi Li,et al.  Learning Mixtures of Linear Regressions with Nearly Optimal Complexity , 2018, COLT.

[36]  P. Nurmi Mixture Models , 2008 .

[37]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[38]  R. D. Veaux,et al.  Mixtures of linear regressions , 1989 .

[39]  Aravindan Vijayaraghavan,et al.  On Learning Mixtures of Well-Separated Gaussians , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[40]  Constantine Caramanis,et al.  EM Converges for a Mixture of Many Linear Regressions , 2019, AISTATS.

[41]  Shai Ben-David,et al.  Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes , 2018, NeurIPS.

[42]  Daniel M. Kane,et al.  Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks , 2020, COLT.

[43]  David P. Woodruff,et al.  Learning Two Layer Rectified Neural Networks in Polynomial Time , 2018, COLT.

[44]  Varun Kanade,et al.  Reliably Learning the ReLU in Polynomial Time , 2016, COLT.

[45]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[46]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[47]  J. Feldman,et al.  PAC Learning Mixtures of Gaussians with No Separation Assumption , 2006 .

[48]  Ilias Diakonikolas,et al.  Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[49]  Daniel M. Kane,et al.  List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[50]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[51]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[52]  Rocco A. Servedio,et al.  Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[53]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.