论文信息 - Variance Breakdown of Huber ( M )-estimators : n / p → m ∈ ( 1 , ∞ )

Variance Breakdown of Huber ( M )-estimators : n / p → m ∈ ( 1 , ∞ )

Huber’s gross-errors contamination model considers the class Fε of all noise distributions F = (1 − ε)Φ + εH, with Φ standard normal, ε ∈ (0, 1) the contamination fraction, and H the contaminating distribution. A half century ago, Huber evaluated the minimax asymptotic variance in scalar location estimation, min ψ max F∈Fε V (ψ, F ) = 1 I(F ∗ ε ) (1) where V (ψ,F ) denotes the asymptotic variance of the (M)-estimator for location with score function ψ, and I(F ∗ ε ) is the minimal Fisher information minFε I(F ). We consider the linear regression model Y = Xθ0 +W , Wi ∼i.i.d. F , and iid Normal predictors Xi,j , working in the high-dimensional-limit asymptotic where the number n of observations and p of variables both grow large, while n/p→ m ∈ (1,∞); hence m plays the role of ‘asymptotic number of observations per parameter estimated’. Let Vm(ψ,F ) denote the per-coordinate asymptotic variance of the (M)-estimator of regression in the n/p → m regime [EKBBL13, DM13, Kar13]. Then Vm 6= V ; however Vm → V as m→∞. In this paper we evaluate the minimax asymptotic variance of the Huber (M)-estimate. The statistician minimizes over the family (ψλ)λ>0 of all tunings of Huber (M)-estimates of regression, and Nature maximizes over gross-error contaminations F ∈ Fε. Suppose that I(F ∗ ε )·m > 1. Then min λ max F∈Fε Vm(ψλ, F ) = 1 I(F ∗ ε )− 1/m . (2) Of course, the RHS of (2) is strictly bigger than the RHS of (1). Strikingly, if I(F ∗ ε ) ·m ≤ 1, then min λ max F∈Fε Vm(ψλ, F ) =∞. In short, the asymptotic variance of the Huber estimator breaks down at a critical ratio of observations per parameter. Classically, for the minimax (M)-estimator of location, no such breakdown occurs [DH83]. However, under this paper’s n/p→ m asymptotic, the breakdown point is where the Fisher information per parameter equals unity: ε∗ ≡ εm(Minimax Huber-(M) Estimate) = inf{ε : m · I(F ∗ ε ) ≥ 1}. ∗Department of Statistics, Stanford University †Department of Electrical Engineering and Department of Statistics, Stanford University

D. Donoho | A. Montanari

[1] Cuthbert Daniel,et al. Fitting Equations to Data: Computer Analysis of Multifactor Data , 1980 .

[2] P. J. Huber. Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[3] F. Hampel. The Influence Curve and Its Role in Robust Estimation , 1974 .

[4] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[5] S. Portnoy. Asymptotic Behavior of $M$-Estimators of $p$ Regression Parameters when $p^2/n$ is Large. I. Consistency , 1984 .

[6] S. Portnoy. Asymptotic behavior of M-estimators of p regression parameters when p , 1985 .

[7] V. Serdobolʹskiĭ. Multivariate statistical analysis : a high-dimensional approach , 2000 .

[8] B. Ripley,et al. Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[9] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[10] P. Bickel,et al. SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[11] Sara van de Geer,et al. Statistics for High-Dimensional Data , 2011 .

[12] P. Bickel,et al. Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[13] P. Bickel,et al. On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[14] Noureddine El Karoui,et al. Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results , 2013, 1311.2445.

[15] Andrea Montanari,et al. High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.