A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates

We study the algorithmic problem of estimating the mean of heavy-tailed random vector in $\mathbb{R}^d$, given $n$ i.i.d. samples. The goal is to design an efficient estimator that attains the optimal sub-gaussian error bound, only assuming that the random vector has bounded mean and covariance. Polynomial-time solutions to this problem are known but have high runtime due to their use of semi-definite programming (SDP). Conceptually, it remains open whether convex relaxation is truly necessary for this problem. In this work, we show that it is possible to go beyond SDP and achieve better computational efficiency. In particular, we provide a spectral algorithm that achieves the optimal statistical performance and runs in time $\widetilde O\left(n^2 d \right)$, improving upon the previous fastest runtime $\widetilde O\left(n^{3.5}+ n^2d\right)$ by Cherapanamjeri el al. (COLT '19). Our algorithm is spectral in that it only requires (approximate) eigenvector computations, which can be implemented very efficiently by, for example, power iteration or the Lanczos method. At the core of our algorithm is a novel connection between the furthest hyperplane problem introduced by Karnin et al. (COLT '12) and a structural lemma on heavy-tailed distributions by Lugosi and Mendelson (Ann. Stat. '19). This allows us to iteratively reduce the estimation error at a geometric rate using only the information derived from the top singular vector of the data matrix, leading to a significantly faster running time.

[1]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[2]  Daniel M. Kane,et al.  Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[3]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[4]  Samuel B. Hopkins Sub-Gaussian Mean Estimation in Polynomial Time , 2018, ArXiv.

[5]  Eric Moulines,et al.  MONK - Outlier-Robust Mean Embedding Estimation by Median-of-Means , 2018, ICML.

[6]  Jakub W. Pachocki,et al.  Geometric median in nearly linear time , 2016, STOC.

[7]  G. Lugosi,et al.  Sub-Gaussian estimators of the mean of a random vector , 2017, The Annals of Statistics.

[8]  Samuel B. Hopkins Mean estimation with sub-Gaussian rates in polynomial time , 2018, The Annals of Statistics.

[9]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[10]  Pradeep Ravikumar,et al.  A Unified Approach to Robust Mean Estimation , 2019, ArXiv.

[11]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[12]  M. Okamoto Some inequalities relating to the partial sum of binomial probabilities , 1959 .

[13]  Boaz Barak,et al.  The uniform hardcore lemma via approximate Bregman projections , 2009, SODA.

[14]  G. Lugosi,et al.  Sub-Gaussian mean estimators , 2015, 1509.05845.

[15]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[16]  G. Lecu'e,et al.  Robust sub-Gaussian estimation of a mean vector in nearly linear time , 2019, The Annals of Statistics.

[17]  G. Lugosi,et al.  Near-optimal mean estimators with respect to general norms , 2018, Probability Theory and Related Fields.

[18]  Shahar Mendelson,et al.  Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey , 2019, Found. Comput. Math..

[19]  Prasad Raghavendra,et al.  Algorithms for heavy-tailed statistics: regression, covariance estimation, and beyond , 2019, STOC.

[20]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[21]  Avrim Blum,et al.  Foundations of Data Science , 2020 .

[22]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[23]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[24]  Samuel B. Hopkins,et al.  Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection , 2019, NeurIPS.

[25]  Yu Cheng,et al.  High-Dimensional Robust Mean Estimation in Nearly-Linear Time , 2018, SODA.

[26]  Peter L. Bartlett,et al.  Fast Mean Estimation with Sub-Gaussian Rates , 2019, COLT.

[27]  Shachar Lovett,et al.  Preface , 2012, COLT.

[28]  G. Lugosi,et al.  On the estimation of the mean of a random vector , 2016, 1607.05421.