Variance Breakdown of Huber ( M )-estimators : n / p → m ∈ ( 1 , ∞ )

Huber’s gross-errors contamination model considers the class Fε of all noise distributions F = (1 − ε)Φ + εH, with Φ standard normal, ε ∈ (0, 1) the contamination fraction, and H the contaminating distribution. A half century ago, Huber evaluated the minimax asymptotic variance in scalar location estimation, min ψ max F∈Fε V (ψ, F ) = 1 I(F ∗ ε ) (1) where V (ψ,F ) denotes the asymptotic variance of the (M)-estimator for location with score function ψ, and I(F ∗ ε ) is the minimal Fisher information minFε I(F ). We consider the linear regression model Y = Xθ0 +W , Wi ∼i.i.d. F , and iid Normal predictors Xi,j , working in the high-dimensional-limit asymptotic where the number n of observations and p of variables both grow large, while n/p→ m ∈ (1,∞); hence m plays the role of ‘asymptotic number of observations per parameter estimated’. Let Vm(ψ,F ) denote the per-coordinate asymptotic variance of the (M)-estimator of regression in the n/p → m regime [EKBBL13, DM13, Kar13]. Then Vm 6= V ; however Vm → V as m→∞. In this paper we evaluate the minimax asymptotic variance of the Huber (M)-estimate. The statistician minimizes over the family (ψλ)λ>0 of all tunings of Huber (M)-estimates of regression, and Nature maximizes over gross-error contaminations F ∈ Fε. Suppose that I(F ∗ ε )·m > 1. Then min λ max F∈Fε Vm(ψλ, F ) = 1 I(F ∗ ε )− 1/m . (2) Of course, the RHS of (2) is strictly bigger than the RHS of (1). Strikingly, if I(F ∗ ε ) ·m ≤ 1, then min λ max F∈Fε Vm(ψλ, F ) =∞. In short, the asymptotic variance of the Huber estimator breaks down at a critical ratio of observations per parameter. Classically, for the minimax (M)-estimator of location, no such breakdown occurs [DH83]. However, under this paper’s n/p→ m asymptotic, the breakdown point is where the Fisher information per parameter equals unity: ε∗ ≡ εm(Minimax Huber-(M) Estimate) = inf{ε : m · I(F ∗ ε ) ≥ 1}. ∗Department of Statistics, Stanford University †Department of Electrical Engineering and Department of Statistics, Stanford University

[1]  Cuthbert Daniel,et al.  Fitting Equations to Data: Computer Analysis of Multifactor Data , 1980 .

[2]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[3]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[4]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[5]  S. Portnoy Asymptotic Behavior of $M$-Estimators of $p$ Regression Parameters when $p^2/n$ is Large. I. Consistency , 1984 .

[6]  S. Portnoy Asymptotic behavior of M-estimators of p regression parameters when p , 1985 .

[7]  V. Serdobolʹskiĭ Multivariate statistical analysis : a high-dimensional approach , 2000 .

[8]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[9]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[10]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[11]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[12]  P. Bickel,et al.  Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[13]  P. Bickel,et al.  On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[14]  Noureddine El Karoui,et al.  Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results , 2013, 1311.2445.

[15]  Andrea Montanari,et al.  High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.