Downweighting Influential Clusters in Surveys

Certain clusters may be extremely influential on survey estimates and consequently contribute disproportionately to their variance. We propose a general approach to estimation that downweights highly influential clusters, with the amount of downweighting based on M-estimation applied to the empirical influence of the clusters. The method is motivated by a problem in census coverage estimation, and we illustrate it by using data from the 1990 Post Enumeration Survey (PES). In this context, an objective, prespecified methodology for handling influential observations is essential to avoid having to justify judgmental post hoc adjustment of weights. In 1990, both extreme weights and large errors in the census led to extreme influence. We estimated influence by Taylor linearization of the survey estimator, and we applied M-estimators based on the t distribution and the Huber ψ-function. As predicted by theory, the robust procedures greatly reduced the estimated variance of estimated coverage rates, more so than did truncation of weights. On the other hand, the procedure may introduce bias into survey estimates when the distributions of the influence statistics are asymmetric. We consider the properties of the estimators in the presence of asymmetry, and we demonstrate techniques for assessing the bias-variance trade-off, finding that estimated mean squared error is reduced by applying the robust procedure to our dataset. We also suggest PES design improvements to reduce the impact of influential clusters.

[1]  M. Steel,et al.  Multivariate Student -t Regression Models : Pitfalls and Inference , 1999 .

[2]  D. Rubin,et al.  Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data , 1998 .

[3]  E. Ronchetti,et al.  Bias‐calibrated estimation from sample surveys containing outliers , 1998 .

[4]  Chuanhai Liu Bayesian robust multivariate linear regression with incomplete data , 1996 .

[5]  R. Tibshirani,et al.  An Introduction to the Bootstrap , 1995 .

[6]  H. Hogan The 1990 Post-Enumeration Survey: operations and results. , 1993, Journal of the American Statistical Association.

[7]  D. Rubin,et al.  Hierarchical logistic regression models for imputation of unresolved enumeration status in undercount estimation. , 1993, Journal of the American Statistical Association.

[8]  L. Rivest,et al.  Outlier Resistant Alternatives to the Ratio Estimator , 1992 .

[9]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[10]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[11]  T. M. F. Smith,et al.  Influential observations in survey sampling , 1987 .

[12]  R. Little,et al.  Editing and Imputation for Quantitative Survey Data , 1987 .

[13]  R. Chambers Outlier Robust Finite Population Estimation , 1986 .

[14]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[15]  K. Wolter Introduction to Variance Estimation , 1985 .

[16]  K. Srinath,et al.  Some Estimators of a Population Total from Simple Random Samples Containing Large Units , 1981 .

[17]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[18]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[19]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .