Robust Estimates of Location and Dispersion for High-Dimensional Datasets

The computing times of high-breakdown point estimates of multivariate location and scatter increase rapidly with the number of variables, which makes them impractical for high-dimensional datasets, such as those used in data mining. We propose an estimator of location and scatter based on a modified version of the Gnanadesikan–Kettenring robust covariance estimate. We compare its behavior with that of the Stahel–Donoho (SD) and Rousseeuw and Van Driessen's fast MCD (FMCD) estimates. In simulations with contaminated multivariate normal data, our estimate is almost as good as SD and clearly better than FMCD. It is much faster than both, especially for large dimension. We give examples with real data with dimensions between 5 and 93, in which the proposed estimate is as good as or better than SD and FMCD at detecting outliers and other structures, with much shorter computing times.

[1]  Francisco J. Prieto,et al.  Multivariate Outlier Detection and Robust Covariance Matrix Estimation , 2001, Technometrics.

[2]  José Agulló Candela Exact Iterative Computation of the Multivariate Minimum Volume Ellipsoid Estimator with a Branch and Bound Algorithm , 1996 .

[3]  M. Genton,et al.  Robustness properties of dispersion estimators , 1999 .

[4]  P. Sen,et al.  Nonparametric methods in multivariate analysis , 1974 .

[5]  Mokhtar Abdullah,et al.  On a Robust Correlation Coefficient , 1990 .

[6]  D. Ruppert Computing S Estimators for Regression and Multivariate Location/Dispersion , 1992 .

[7]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[8]  Douglas M. Hawkins,et al.  The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data , 1994 .

[9]  William H. Press,et al.  Numerical Recipes: FORTRAN , 1988 .

[10]  David M. Rocke,et al.  Computable Robust Estimation of Multivariate Location and Shape in High Dimension Using Compound Estimators , 1994 .

[11]  H. P. Lopuhaä Multivariate τ‐estimators for location and scatter , 1991 .

[12]  Victor J. Yohai,et al.  The Behavior of the Stahel-Donoho Robust Multivariate Estimator , 1995 .

[13]  M. Hubert Multivariate outlier detection and robust covariance matrix estimation - Discussion , 2001 .

[14]  M. Genton,et al.  Highly Robust Estimation of Dispersion Matrices , 2001 .

[15]  Peter J. Bickel,et al.  On Some Alternative Estimates for Shift in the $P$-Variate One Sample Problem , 1964 .

[16]  David M. Rocke,et al.  Heuristic Search Algorithms for the Minimum Volume Ellipsoid , 1993 .

[17]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[18]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[19]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .

[20]  Yohai Victor,et al.  The maximum bias of robust covariances , 1990 .

[21]  V. Yohai,et al.  Bias-robust estimators of multivariate scatter based on projections , 1992 .

[22]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[23]  S. J. Devlin,et al.  Robust Estimation of Dispersion Matrices and Principal Components , 1981 .

[24]  Geert Molenberghs,et al.  Transformation of non positive semidefinite correlation matrices , 1993 .

[25]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[26]  Peter J. Rousseeuw,et al.  Time-Efficient Algorithms for Two Highly Robust Estimators of Scale , 1992 .

[27]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[28]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[29]  V. Yohai,et al.  High Breakdown-Point Estimates of Regression by Means of the Minimization of an Efficient Scale , 1988 .