Huber’s gross-errors contamination model considers the class Fε of all noise distributions F = (1 − ε)Φ + εH, with Φ standard normal, ε ∈ (0, 1) the contamination fraction, and H the contaminating distribution. A half century ago, Huber evaluated the minimax asymptotic variance in scalar location estimation, min ψ max F∈Fε V (ψ, F ) = 1 I(F ∗ ε ) (1) where V (ψ,F ) denotes the asymptotic variance of the (M)-estimator for location with score function ψ, and I(F ∗ ε ) is the minimal Fisher information minFε I(F ). We consider the linear regression model Y = Xθ0 +W , Wi ∼i.i.d. F , and iid Normal predictors Xi,j , working in the high-dimensional-limit asymptotic where the number n of observations and p of variables both grow large, while n/p→ m ∈ (1,∞); hence m plays the role of ‘asymptotic number of observations per parameter estimated’. Let Vm(ψ,F ) denote the per-coordinate asymptotic variance of the (M)-estimator of regression in the n/p → m regime [EKBBL13, DM13, Kar13]. Then Vm 6= V ; however Vm → V as m→∞. In this paper we evaluate the minimax asymptotic variance of the Huber (M)-estimate. The statistician minimizes over the family (ψλ)λ>0 of all tunings of Huber (M)-estimates of regression, and Nature maximizes over gross-error contaminations F ∈ Fε. Suppose that I(F ∗ ε )·m > 1. Then min λ max F∈Fε Vm(ψλ, F ) = 1 I(F ∗ ε )− 1/m . (2) Of course, the RHS of (2) is strictly bigger than the RHS of (1). Strikingly, if I(F ∗ ε ) ·m ≤ 1, then min λ max F∈Fε Vm(ψλ, F ) =∞. In short, the asymptotic variance of the Huber estimator breaks down at a critical ratio of observations per parameter. Classically, for the minimax (M)-estimator of location, no such breakdown occurs [DH83]. However, under this paper’s n/p→ m asymptotic, the breakdown point is where the Fisher information per parameter equals unity: ε∗ ≡ εm(Minimax Huber-(M) Estimate) = inf{ε : m · I(F ∗ ε ) ≥ 1}. ∗Department of Statistics, Stanford University †Department of Electrical Engineering and Department of Statistics, Stanford University
[1]
Cuthbert Daniel,et al.
Fitting Equations to Data: Computer Analysis of Multifactor Data
,
1980
.
[2]
P. J. Huber.
Robust Regression: Asymptotics, Conjectures and Monte Carlo
,
1973
.
[3]
F. Hampel.
The Influence Curve and Its Role in Robust Estimation
,
1974
.
[4]
Frederick R. Forst,et al.
On robust estimation of the location parameter
,
1980
.
[5]
S. Portnoy.
Asymptotic Behavior of $M$-Estimators of $p$ Regression Parameters when $p^2/n$ is Large. I. Consistency
,
1984
.
[6]
S. Portnoy.
Asymptotic behavior of M-estimators of p regression parameters when p
,
1985
.
[7]
V. Serdobolʹskiĭ.
Multivariate statistical analysis : a high-dimensional approach
,
2000
.
[8]
B. Ripley,et al.
Robust Statistics
,
2018,
Encyclopedia of Mathematical Geosciences.
[9]
Terence Tao,et al.
The Dantzig selector: Statistical estimation when P is much larger than n
,
2005,
math/0506081.
[10]
P. Bickel,et al.
SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
,
2008,
0801.1095.
[11]
Sara van de Geer,et al.
Statistics for High-Dimensional Data
,
2011
.
[12]
P. Bickel,et al.
Optimal M-estimation in high-dimensional regression
,
2013,
Proceedings of the National Academy of Sciences.
[13]
P. Bickel,et al.
On robust regression with high-dimensional predictors
,
2013,
Proceedings of the National Academy of Sciences.
[14]
Noureddine El Karoui,et al.
Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results
,
2013,
1311.2445.
[15]
Andrea Montanari,et al.
High dimensional robust M-estimation: asymptotic variance via approximate message passing
,
2013,
Probability Theory and Related Fields.