ROBUST MULTIVARIATE OUTLIER DETECTION USING MAHALANOBIS’ DISTANCE AND MODIFIED STAHEL-DONOHO ESTIMATORS

This paper illustrates the practical application of a robust multivariate outlier detection method used to edit survey data. Outliers are identified by calculating Mahalanobis’ distance, where the location vector and scatter matrix are robustly estimated using modified Stahel-Donoho estimators. This method of multivariate outlier detection has been successfully employed at Statistics Canada for several years by the Annual Wholesale and Retail Trade Survey and the Monthly Survey of Manufacturers. Currently, none of these surveys uses sampling weights during outlier detection. We propose a simple method for incorporating the weights. In order to compare outlier detection with and without the use of sampling weights, results are presented for a simulated contaminated bivariate population.

[1]  P. Holland,et al.  Robust regression using iteratively reweighted least-squares , 1977 .

[2]  K. Srinath,et al.  Some Estimators of a Population Total from Simple Random Samples Containing Large Units , 1981 .

[3]  Victor J. Yohai,et al.  The Behavior of the Stahel-Donoho Robust Multivariate Estimator , 1995 .

[4]  R. Chambers Outlier Robust Finite Population Estimation , 1986 .

[5]  L. Rivest Statistical properties of Winsorized means for skewed distributions , 1994 .

[6]  L. Ernst COMPARISON OF ESTIMATORS OF THE MEAN WHICH ADJUST FOR LARGE OBSERVATIONS , 2002 .

[7]  Sarah Franklin,et al.  A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD , 2002 .

[8]  Weighting and calibration in sample survey estimation , 1997 .

[9]  V. Barnett Outliers in sample surveys , 1994 .

[10]  R. Sitter A resampling procedure for complex survey data , 1992 .

[11]  P J McCarthy,et al.  The bootstrap and finite population sampling. , 1985, Vital and health statistics. Series 2, Data evaluation and methods research.

[12]  Raymond J. Carroll,et al.  On Estimating Variances of Robust Estimators When the Errors are Asymmetric , 1979 .

[13]  Frank J. Potter,et al.  A STUDY OF PROCEDURES TO IDENTIFY AND TRIM EXTREME SAMPLING WEIGHTS , 2002 .

[14]  J. Tambay AN INTEGRATED APPROACH FOR THE TREATMENT OF OUTLIERS IN SUB-ANNUAL ECONOMIC SURVEYS , 2002 .

[15]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[16]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[17]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[18]  C. F. Wu,et al.  Resampling Inference with Complex Survey Data , 1988 .

[19]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[20]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[21]  Robust Sample Survey Inference via Bootstrapping and Bias Correction: The Case of the Ratio Estimator , 2003 .

[22]  Donald T. Searls An Estimator for a Population Mean Which Reduces the Effect of Large True Observations , 1966 .

[23]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[24]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[25]  W. Stahel Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen , 1981 .

[26]  J. Kruskal TOWARD A PRACTICAL METHOD WHICH HELPS UNCOVER THE STRUCTURE OF A SET OF MULTIVARIATE OBSERVATIONS BY FINDING THE LINEAR TRANSFORMATION WHICH OPTIMIZES A NEW “INDEX OF CONDENSATION” , 1969 .

[27]  R. Clark Winsorization methods in sample surveys , 1995 .

[28]  B. Efron Nonparametric standard errors and confidence intervals , 1981 .