A procedure for the detection of multivariate outliers

Abstract Single case diagnostics are susceptible to a masking effect. This has led to the development of methods for detecting of multiple multivariate outliers. The available methods work well but may not be able to always detect outliers in data with contamination fraction greater than 35%, as reported by Rocke and Woodruff, 1996 , (J. Am. Statist. Assoc. 91, 1047–1061). In this paper we propose a new method for detection of outliers which is very resistant to such high contamination of data with outliers. The simulation results indicate that, while maintaining the nominal level, the proposed method is never worse and detects outliers better than the Rocke and Woodruff method for data highly contaminated (35–45%) with outliers. Improved performance was also noted for data with smaller contamination fraction (15–20%) when outliers were situated closer to the “good” data. Several data sets are used to illustrate the proposed procedure.

[1]  N. Campbell Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation , 1980 .

[2]  A. Hadi A Modification of a Method for the Detection of Outliers in Multivariate Samples , 1994 .

[3]  Victor J. Yohai,et al.  The Behavior of the Stahel-Donoho Robust Multivariate Estimator , 1995 .

[4]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[5]  David L. Woodruff,et al.  Identification of Outliers in Multivariate Data , 1996 .

[6]  Douglas M. Hawkins,et al.  The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data , 1994 .

[7]  David E. Tyler Some Issues in the Robust Estimation of Multivariate Location and Scatter , 1991 .

[8]  J. Daudin,et al.  Stability of principal component analysis studied by the bootstrap method , 1988 .

[9]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[10]  A. Hadi Identifying Multiple Outliers in Multivariate Data , 1992 .

[11]  David E. Tyler Robustness and efficiency properties of scatter matrices , 1983 .

[12]  David L. Woodruff,et al.  Computation of robust estimates of multivariate location and shape , 1993 .

[13]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[14]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[15]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[16]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[17]  S. Chatterjee Sensitivity analysis in linear regression , 1988 .

[18]  Shizuhiko Nishisato,et al.  Elements of Dual Scaling: An Introduction To Practical Data Analysis , 1993 .

[19]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[20]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[21]  Sanford Weisberg,et al.  Directions in Robust Statistics and Diagnostics , 1991 .

[22]  H. P. Lopuhaä On the relation between S-estimators and M-estimators of multivariate location and covariance , 1989 .

[23]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[24]  A. Atkinson Fast Very Robust Methods for the Detection of Multiple Outliers , 1994 .

[25]  G. V. Kass,et al.  Location of Several Outliers in Multiple-Regression Data Using Elemental Sets , 1984 .

[26]  N. Campbell Robust Procedures in Multivariate Analysis II. Robust Canonical Variate Analysis , 1982 .

[27]  S. J. Devlin,et al.  Robust Estimation of Dispersion Matrices and Principal Components , 1981 .