RelaxMCD: Smooth optimisation for the Minimum Covariance Determinant estimator

The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the centre and shape of a high dimensional data set. It consists of determining a subsample of h points out of n which minimises the generalised variance. By definition, the computation of this estimator gives rise to a combinatorial optimisation problem, for which several approximate algorithms have been developed. Some of these approximations are quite powerful, but they do not take advantage of any smoothness in the objective function. Recently, in a general framework, an approach transforming any discrete and high dimensional combinatorial problem of this type into a continuous and low-dimensional one has been developed and a general algorithm to solve the transformed problem has been designed. The idea is to build on that general algorithm in order to take into account particular features of the MCD methodology. More specifically, two main goals are considered: (a) adaptation of the algorithm to the specific MCD target function and (b) comparison of this 'tuned' algorithm with the usual competitors for computing MCD. The adaptation focuses on the design of 'clever' starting points in order to systematically investigate the search domain. Accordingly, a new and surprisingly efficient procedure based on a suitably equivariant modification of the well-known k-means algorithm is constructed. The adapted algorithm, called RelaxMCD, is then compared by means of simulations with FASTMCD and the Feasible Subset Algorithm, both benchmark algorithms for computing MCD. As a by-product, it is shown that RelaxMCD is a general technique encompassing the two others, yielding insight into their overall good performance.

[1]  Francisco J. Prieto,et al.  Multivariate Outlier Detection and Robust Covariance Matrix Estimation , 2001, Technometrics.

[2]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[3]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[4]  M. Jhun,et al.  Asymptotics for the minimum covariance determinant estimator , 1993 .

[5]  David M. Rocke,et al.  Computable Robust Estimation of Multivariate Location and Shape in High Dimension Using Compound Estimators , 1994 .

[6]  Valentin Todorov Computing the Minimum Covariance Determinant Estimator (MCD) by simulated annealing , 1992 .

[7]  Paul Fischer,et al.  The complexity of computing the MCD-estimator , 2004, Theor. Comput. Sci..

[8]  Frank Critchley,et al.  A relaxed approach to combinatorial problems in robustness and diagnostics , 2010, Stat. Comput..

[9]  Georg Ch. Pflug,et al.  Mathematical statistics and applications , 1985 .

[10]  David J. Olive,et al.  Inconsistency of Resampling Algorithms for High-Breakdown Regression Estimators and a New Algorithm , 2002 .

[11]  Douglas M. Hawkins,et al.  The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data , 1994 .

[12]  Dankmar Böhning,et al.  The lower bound method in probit regression , 1999 .

[13]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[14]  Panos M. Pardalos,et al.  Constrained Global Optimization: Algorithms and Applications , 1987, Lecture Notes in Computer Science.

[15]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[16]  Luis Angel García-Escudero,et al.  The importance of the scales in heterogeneous robust clustering , 2007, Comput. Stat. Data Anal..

[17]  Aimo A. Törn,et al.  Global Optimization , 1999, Science.

[18]  Douglas M. Hawkins,et al.  Improved Feasible Solution Algorithms for High Breakdown Estimation , 1999 .