论文信息 - Sub-Gaussian Mean Estimation in Polynomial Time

Sub-Gaussian Mean Estimation in Polynomial Time

We study polynomial time algorithms for estimating the mean of a random vector $X$ in $\mathbb{R}^d$ from $n$ independent samples $X_1,\ldots,X_n$ when $X$ may be heavy-tailed. We assume only that $X$ has finite mean $\mu$ and covariance $\Sigma$. In this setting, the radius of confidence intervals achieved by the empirical mean are large compared to the case that $X$ is Gaussian or sub-Gaussian. In particular, for confidence $\delta > 0$, the empirical mean has confidence intervals with radius of order $\sqrt{\text{Tr} \Sigma / \delta n}$ rather than $\sqrt{\text{Tr} \Sigma /n } + \sqrt{ \lambda_{\max}(\Sigma) \log (1/\delta) / n}$ from the Gaussian case. We offer the first polynomial time algorithm to estimate the mean with sub-Gaussian confidence intervals under such mild assumptions. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of $O(1)$ moments of $X$ either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring $\exp(d)$ time.

Samuel B. Hopkins

[1] Ludek Kucera,et al. Expected Complexity of Graph Partitioning Problems , 1995, Discret. Appl. Math..

[2] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[3] David P. Williamson,et al. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[4] B. Nadler,et al. DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[5] Leslie G. Valiant,et al. Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..