Fast Algorithms for Computing High Breakdown Covariance Matrices with Missing Data

Robust estimation of covariance matrices when some of the data at hand are missing is an important problem. It has been studied by Little and Smith (1987) and more recently by Cheng and Victoria-Feser (2002). The latter propose the use of high breakdown estimators and so-called hybrid algorithms (see, e.g., Woodruff and Rocke, 1994). In particular, the minimum volume ellipsoid of Rousseeuw (1984) is adapted to the case of missing data. To compute it, they use (a modified version of) the forward search algorithm (see e.g. Atkinson, 1994). In this paper, we propose to use instead a modification of the C-step algorithm proposed by Rousseeuw and Van Driessen (1999) which is actually a lot faster. We also adapt the orthogonalized Gnanadesikan-Kettenring (OGK) estimator proposed by Maronna and Zamar (2002) to the case of missing data and use it as a starting point for an adapted S-estimator. Moreover, we conduct a simulation study to compare different robust estimators in terms of their efficiency and breakdown.

[1]  M. Victoria-Feser,et al.  High-breakdown estimation of multivariate mean and covariance with missing observations. , 2002, The British journal of mathematical and statistical psychology.

[2]  W. R. Buckland,et al.  Contributions to Probability and Statistics , 1960 .

[3]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[4]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[5]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[6]  R. Little,et al.  Editing and Imputation for Quantitative Survey Data , 1987 .

[7]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[8]  F. Hampel A General Qualitative Definition of Robustness , 1971 .

[9]  K. Yuan,et al.  Robust mean and covariance structure analysis. , 1998, The British journal of mathematical and statistical psychology.

[10]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[11]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[12]  Werner A. Stahel,et al.  New directions in statistical data analysis and robustness. Proceedings of the Workshop on Data Analysis and Robustness held in Ascona, 1992 , 1994 .

[13]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[14]  A. Atkinson Fast Very Robust Methods for the Detection of Multiple Outliers , 1994 .

[15]  David M. Rocke Robustness properties of S-estimators of multivariate location and shape in high dimension , 1996 .

[16]  David M. Rocke,et al.  Computable Robust Estimation of Multivariate Location and Shape in High Dimension Using Compound Estimators , 1994 .

[17]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[18]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[19]  鄭宗記,et al.  High Breakdown Estimation of Multivariate Location and Scale With Missing Observations , 2000 .

[20]  Ruben H. Zamar,et al.  Robust Estimates of Location and Dispersion for High-Dimensional Datasets , 2002, Technometrics.

[21]  W. Härdle,et al.  Robust and Nonlinear Time Series Analysis , 1984 .

[22]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[23]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  David M. Rocke,et al.  Heuristic Search Algorithms for the Minimum Volume Ellipsoid , 1993 .

[26]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[27]  Douglas M. Hawkins,et al.  The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data , 1994 .