Unmasking Multivariate Outliers and Leverage Points

Abstract Detecting outliers in a multivariate point cloud is not trivial, especially when there are several outliers. The classical identification method does not always find them, because it is based on the sample mean and covariance matrix, which are themselves affected by the outliers. That is how the outliers get masked. To avoid the masking effect, we propose to compute distances based on very robust estimates of location and covariance. These robust distances are better suited to expose the outliers. In the case of regression data, the classical least squares approach masks outliers in a similar way. Also here, the outliers may be unmasked by using a highly robust regression method. Finally, a new display is proposed in which the robust regression residuals are plotted versus the robust distances. This plot classifies the data into regular observations, vertical outliers, good leverage points, and bad leverage points. Several examples are discussed.

[1]  W. R. Buckland,et al.  Statistical Theory and Methodology in Science and Engineering. , 1960 .

[2]  E. Lloyd Statistical Theory and Methodology in Science and Engineering , 1961 .

[3]  Ramanathan Gnanadesikan,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[4]  N. Campbell Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation , 1980 .

[5]  W. Stahel Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen , 1981 .

[6]  S. J. Devlin,et al.  Robust Estimation of Dispersion Matrices and Principal Components , 1981 .

[7]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[8]  G. V. Kass,et al.  Location of Several Outliers in Multiple-Regression Data Using Elemental Sets , 1984 .

[9]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[10]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[11]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[12]  D. Ruppert,et al.  Transformation and Weighting in Regression , 1988 .

[13]  P. Rousseeuw,et al.  A robust scale estimator based on the shortest half , 1988 .

[14]  R. Grübel The Length of the Shorth , 1988 .

[15]  C. Y. Chork Unmasking multivariate anomalous observations in exploration geochemical data from sheeted-vein tin mineralization near Emmaville, N.S.W., Australia , 1990 .

[16]  N. Neykov,et al.  Unmasking Multivariate Outliers and Leverage Points by Means of BMDP3R , 1991 .

[17]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[18]  Gerald Karnel Robust canonical correlation and correspondence analysis , 1991 .