论文信息 - Learning Entangled Single-Sample Gaussians in the Subset-of-Signals Model

Learning Entangled Single-Sample Gaussians in the Subset-of-Signals Model

In the setting of entangled single-sample distributions, the goal is to estimate some common parameter shared by a family of $n$ distributions, given one single sample from each distribution. This paper studies mean estimation for entangled single-sample Gaussians that have a common mean but different unknown variances. We propose the subset-of-signals model where an unknown subset of $m$ variances are bounded by 1 while there are no assumptions on the other variances. In this model, we analyze a simple and natural method based on iteratively averaging the truncated samples, and show that the method achieves error $O \left(\frac{\sqrt{n\ln n}}{m}\right)$ with high probability when $m=\Omega(\sqrt{n\ln n})$, matching existing bounds for this range of $m$. We further prove lower bounds, showing that the error is $\Omega\left(\left(\frac{n}{m^4}\right)^{1/2}\right)$ when $m$ is between $\Omega(\ln n)$ and $O(n^{1/4})$, and the error is $\Omega\left(\left(\frac{n}{m^4}\right)^{1/6}\right)$ when $m$ is between $\Omega(n^{1/4})$ and $O(n^{1 - \epsilon})$ for an arbitrarily small $\epsilon>0$, improving existing lower bounds and extending to a wider range of $m$.

Yingyu Liang | Hui Yuan

[1] Roman Vershynin,et al. High-Dimensional Probability , 2018 .

[2] Santosh S. Vempala,et al. The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[3] Mikhail Belkin,et al. Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[4] B Rosner,et al. Regression analysis in the presence of heterogeneous intraclass correlations. , 1986, Biometrics.

[5] Mikhail Belkin,et al. Toward Learning Gaussian Mixtures with Arbitrary Separation , 2010, COLT.

[6] Maurizio Vichi,et al. Multivariate linear regression for heterogeneous data , 2013 .

[7] Ankur Moitra,et al. Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[8] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[9] Jeffrey A. Fessler,et al. Asymptotic performance of PCA for high-dimensional heteroscedastic data , 2017, J. Multivar. Anal..

[10] Jeffrey A. Fessler,et al. Optimally Weighted PCA for High-Dimensional Heteroscedastic Data , 2018, SIAM Journal on Mathematics of Data Science.

[11] Leslie G. Valiant,et al. Learning Disjunction of Conjunctions , 1985, IJCAI.

[12] Yingyu Liang,et al. Learning Entangled Single-Sample Distributions via Iterative Trimming , 2020, AISTATS.

[13] Anru R. Zhang,et al. Heteroskedastic PCA: Algorithm, optimality, and applications , 2018, The Annals of Statistics.

[14] Jerry Li,et al. Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[15] Sanjoy Dasgupta,et al. Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[16] Dimitris Achlioptas,et al. On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[17] Yu Cheng,et al. High-Dimensional Robust Mean Estimation in Nearly-Linear Time , 2018, SODA.

[18] Adam Tauman Kalai,et al. Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[19] Jerry Li,et al. Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[20] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[21] Varun Jog,et al. Estimating location parameters in entangled single-sample distributions , 2019, ArXiv.