论文信息 - Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models - 字舞流文

Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

Let <tex>$V$</tex> be any vector space of multivariate degree-<tex>$d$</tex> homogeneous polynomials with co-dimension at most <tex>$k$</tex>, and <tex>$S$</tex> be the set of points where all polynomials in <tex>$V$</tex> nearly vanish. We establish a qualitatively optimal upper bound on the size of <tex>$\epsilon$</tex>-covers for <tex>$S$</tex>, in the <tex>$\ell_{2}$</tex>-norm. Roughly speaking, we show that there exists an <tex>$\epsilon$</tex>-cover for <tex>$S$</tex> of cardinality <tex>$M=(k/\epsilon)^{O_{d}(k^{1/d})}$</tex>. Our result is constructive yielding an algorithm to compute such an <tex>$\epsilon$</tex>-cover that runs in time <tex>$\text{poly}(M)$</tex>. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for <tex>$k$</tex>-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with <tex>$k$</tex> hidden units (under the Gaussian distribution), density and parameter estimation for <tex>$k$</tex>-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for <tex>$k$</tex>-mixtures of hyperplanes. Our algorithms run in time quasi-polynomial in the parameter <tex>$k$</tex>. Previous algorithms for these problems had running times exponential in <tex>$k^{\Omega(1)}$</tex>. At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.

Daniel M. Kane | Ilias Diakonikolas | Ilias Diakonikolas | D. Kane

[1] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.

[2] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[3] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[4] Santosh S. Vempala,et al. A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[5] Aditya Bhaskara,et al. Smoothed analysis of tensor decompositions , 2013, STOC.

[6] Rocco A. Servedio,et al. Learning mixtures of structured distributions over discrete domains , 2012, SODA.

[7] Moritz Hardt,et al. Tight Bounds for Learning a Mixture of Two Gaussians , 2014, STOC.

[8] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[9] Inderjit S. Dhillon,et al. Mixed Linear Regression with Multiple Components , 2016, NIPS.

[10] Qingqing Huang,et al. Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[11] Aditya Bhaskara,et al. Sparse Solutions to Nonnegative Linear Systems and Applications , 2015, AISTATS.

[12] Zhao Song,et al. Learning mixtures of linear regressions in subexponential time via Fourier moments , 2019, STOC.

[13] Alon Orlitsky,et al. Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[14] Ankur Moitra,et al. Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[15] Hans-Peter Kriegel,et al. Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[16] Constantinos Daskalakis,et al. Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[17] John Wilmes,et al. Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds , 2018, COLT.

[18] Pravesh Kothari,et al. Outlier-robust moment-estimation via sum-of-squares , 2017, ArXiv.

[19] ScienceDirect. Computational statistics & data analysis , 1983 .

[20] Sanjoy Dasgupta,et al. Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[21] Ilias Diakonikolas,et al. Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[22] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.

[23] Daniel M. Kane,et al. Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[24] Zhize Li,et al. Learning Two-layer Neural Networks with Symmetric Inputs , 2018, ICLR.

[25] Anima Anandkumar,et al. Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.

[26] Mikhail Belkin,et al. The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[27] Jerry Li,et al. Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[28] Yuchen Zhang,et al. L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.

[29] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[30] Adam R. Klivans,et al. Learning Neural Networks with Two Nonlinear Layers in Polynomial Time , 2017, COLT.

[31] Sham M. Kakade,et al. Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[32] Jerry Li,et al. Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities , 2017, COLT.

[33] L. Gross. LOGARITHMIC SOBOLEV INEQUALITIES. , 1975 .

[34] K. Pearson. Contributions to the Mathematical Theory of Evolution , 1894 .

[35] Yuanzhi Li,et al. Learning Mixtures of Linear Regressions with Nearly Optimal Complexity , 2018, COLT.

[36] P. Nurmi. Mixture Models , 2008 .

[37] Dimitris Achlioptas,et al. On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[38] R. D. Veaux,et al. Mixtures of linear regressions , 1989 .

[39] Aravindan Vijayaraghavan,et al. On Learning Mixtures of Well-Separated Gaussians , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[40] Constantine Caramanis,et al. EM Converges for a Mixture of Many Linear Regressions , 2019, AISTATS.

[41] Shai Ben-David,et al. Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes , 2018, NeurIPS.

[42] Daniel M. Kane,et al. Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks , 2020, COLT.

[43] David P. Woodruff,et al. Learning Two Layer Rectified Neural Networks in Polynomial Time , 2018, COLT.

[44] Varun Kanade,et al. Reliably Learning the ReLU in Polynomial Time , 2016, COLT.

[45] Ryan O'Donnell,et al. Analysis of Boolean Functions , 2014, ArXiv.

[46] Santosh S. Vempala,et al. The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[47] J. Feldman,et al. PAC Learning Mixtures of Gaussians with No Separation Assumption , 2006 .

[48] Ilias Diakonikolas,et al. Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[49] Daniel M. Kane,et al. List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[50] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[51] Santosh S. Vempala,et al. Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[52] Rocco A. Servedio,et al. Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[53] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.