Identification of Outliers in Multivariate Data

Abstract New insights are given into why the problem of detecting multivariate outliers can be difficult and why the difficulty increases with the dimension of the data. Significant improvements in methods for detecting outliers are described, and extensive simulation experiments demonstrate that a hybrid method extends the practical boundaries of outlier detection capabilities. Based on simulation results and examples from the literature, the question of what levels of contamination can be detected by this algorithm as a function of dimension, computation time, sample size, contamination fraction, and distance of the contamination from the main body of data is investigated. Software to implement the methods is available from the authors and STATLIB.

[1]  Peter J. Rousseeuw,et al.  Robust Distances: Simulations and Cutoff Values , 1991 .

[2]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[3]  J. Daudin,et al.  Stability of principal component analysis studied by the bootstrap method , 1988 .

[4]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[5]  H. P. Lopuhaä On the relation between S-estimators and M-estimators of multivariate location and covariance , 1989 .

[6]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[7]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[8]  C. J. Lawrence Robust estimates of location : survey and advances , 1975 .

[9]  N. Campbell Robust Procedures in Multivariate Analysis II. Robust Canonical Variate Analysis , 1982 .

[10]  S. J. Devlin,et al.  Robust Estimation of Dispersion Matrices and Principal Components , 1981 .

[11]  Anthony C. Atkinson,et al.  The stalactite plot for the detection of multivariate outliers , 1993 .

[12]  D. Ruppert Computing S Estimators for Regression and Multivariate Location/Dispersion , 1992 .

[13]  David M. Rocke Robustness properties of S-estimators of multivariate location and shape in high dimension , 1996 .

[14]  Hendrik P. Lopuhaä,et al.  Highly efficient estimators of multivariate location with high breakdown point , 1992 .

[15]  David M. Rocke,et al.  On the cumulants of affine equivariant estimators in elliptical families , 1990 .

[16]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[17]  Douglas M. Hawkins,et al.  A Feasible Solution Algorithm for the Minimum Volume Ellipsoid Estimator in Multivariate Data , 1993 .

[18]  Douglas M. Hawkins,et al.  The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data , 1994 .

[19]  A. Atkinson Fast Very Robust Methods for the Detection of Multiple Outliers , 1994 .

[20]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[21]  David M. Rocke,et al.  Computable Robust Estimation of Multivariate Location and Shape in High Dimension Using Compound Estimators , 1994 .

[22]  David E. Tyler Some results on the existence, uniqueness, and computation of the M-estimates of multivariate location and scatter , 1988 .

[23]  David L. Woodruff Ghost Image Processing for Minimum Covariance Determinants , 1995, INFORMS J. Comput..

[24]  David E. Tyler Some Issues in the Robust Estimation of Multivariate Location and Scatter , 1991 .

[25]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[26]  M. Jhun,et al.  Asymptotics for the minimum covariance determinant estimator , 1993 .

[27]  G. V. Kass,et al.  Location of Several Outliers in Multiple-Regression Data Using Elemental Sets , 1984 .

[28]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[29]  Victor J. Yohai,et al.  The Behavior of the Stahel-Donoho Robust Multivariate Estimator , 1995 .

[30]  David E. Tyler,et al.  Redescending $M$-Estimates of Multivariate Location and Scatter , 1991 .

[31]  A. Hadi Identifying Multiple Outliers in Multivariate Data , 1992 .

[32]  David E. Tyler Robustness and efficiency properties of scatter matrices , 1983 .

[33]  David L. Woodruff,et al.  Computation of robust estimates of multivariate location and shape , 1993 .

[34]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[35]  David M. Rocke,et al.  Heuristic Search Algorithms for the Minimum Volume Ellipsoid , 1993 .

[36]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[37]  W. Stahel Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen , 1981 .

[38]  N. Campbell Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation , 1980 .