A fast algorithm for the minimum covariance determinant estimator

The minimum covariance determinant (MCD) method of Rousseeuw is a highly robust estimator of multivariate location and scatter. Its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant. Until now, applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. We discuss two important applications of larger size, one about a production process at Philips with n = 677 objects and p = 9 variables, and a dataset from astronomy with n = 137,256 objects and p = 27 variables. To deal with such problems we have developed a new algorithm for the MCD, called FAST-MCD. The basic ideas are an inequality involving order statistics and determinants, and techniques which we call “selective iteration” and “nested extensions.” For small datasets, FAST-MCD typically finds the exact MCD, whereas for larger datasets it gives more accurate results than existing algorithms and is faster by orders...

[1]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[2]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[3]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[4]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[5]  Factors Influencing Motor Insurance Rates , 1985 .

[6]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[7]  T. Nickelson,et al.  Influences of Upwelling, Ocean Temperature, and Smolt Abundance on Marine Survival of Coho Salmon (Oncorhynchus kisutch) in the Oregon Production Area , 1986 .

[8]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[9]  Rudolf Grübel,et al.  A minimal characterization of the covariance matrix , 1988 .

[10]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[11]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[12]  P. L. Davies,et al.  The asymptotics of Rousseeuw's minimum volume ellipsoid estimator , 1992 .

[13]  D. G. Simpson,et al.  On One-Step GM Estimates and Stability of Inferences in Linear Regression , 1992 .

[14]  David M. Rocke,et al.  Heuristic Search Algorithms for the Minimum Volume Ellipsoid , 1993 .

[15]  M. Jhun,et al.  Asymptotics for the minimum covariance determinant estimator , 1993 .

[16]  C. W. Coakley,et al.  A Bounded Influence, High Breakdown, Efficient Regression Estimator , 1993 .

[17]  Douglas M. Hawkins,et al.  Exact iterative computation of the robust multivariate minimum volume ellipsoid estimator , 1993 .

[18]  David M. Rocke,et al.  Computable Robust Estimation of Multivariate Location and Shape in High Dimension Using Compound Estimators , 1994 .

[19]  Douglas M. Hawkins,et al.  The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data , 1994 .

[20]  José Agulló Candela Exact Iterative Computation of the Multivariate Minimum Volume Ellipsoid Estimator with a Branch and Bound Algorithm , 1996 .

[21]  David L. Woodruff,et al.  Identification of Outliers in Multivariate Data , 1996 .

[22]  Douglas M. Hawkins,et al.  High-Breakdown Linear Discriminant Analysis , 1997 .

[23]  P. Rousseeuw 5 Introduction to positive-breakdown methods , 1997 .

[24]  C. Croux,et al.  Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator , 1999 .

[25]  Dankmar Böhning,et al.  The lower bound method in probit regression , 1999 .