Multivariate Outlier Detection With High-Breakdown Estimators

In this paper we develop multivariate outlier tests based on the high-breakdown Minimum Covariance Determinant estimator. The rules that we propose have good performance under the null hypothesis of no outliers in the data and also appreciable power properties for the purpose of individual outlier detection. This achievement is made possible by two orders of improvement over the currently available methodology. First, we suggest an approximation to the exact distribution of robust distances from which cut-off values can be obtained even in small samples. Our thresholds are accurate, simple to implement and result in more powerful outlier identification rules than those obtained by calibrating the asymptotic distribution of distances. The second power improvement comes from the addition of a new iteration step after one-step reweighting of the estimator. The proposed methodology is motivated by asymptotic distributional results. Its finite sample performance is evaluated through simulations and compared to that of available multivariate outlier tests.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Anthony C. Atkinson,et al.  Controlling the size of multivariate outlier tests with the MCD estimator of scatter , 2009, Stat. Comput..

[3]  G. M. Tallis Elliptical and Radial Truncation in Normal Populations , 1963 .

[4]  J. A. Cuesta-Albertos,et al.  Trimming and likelihood: Robust location and dispersion estimation in the elliptical model , 2008, 0811.0503.

[5]  Francisco J. Prieto,et al.  Multivariate Outlier Detection and Robust Covariance Matrix Estimation , 2001, Technometrics.

[6]  Tena I. Katsaounis,et al.  Exploring Multivariate Data with the Forward Search , 2004, Technometrics.

[7]  Ursula Gather,et al.  The Masking Breakdown Point of Multivariate Outlier Identification Rules , 1999 .

[8]  A. Dasgupta Asymptotic Theory of Statistics and Probability , 2008 .

[9]  Perrotta Domenico,et al.  Fitting Mixtures of Regression Lines with the Forward Search , 2008 .

[10]  P. Prescott,et al.  Sequential Application of Wilks's Multivariate Outlier Test , 1992 .

[11]  M. Hubert,et al.  High-Breakdown Robust Multivariate Methods , 2008, 0808.0657.

[12]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[13]  Terence P. Speed,et al.  Quality Assessment for Short Oligonucleotide Microarray Data , 2007, Technometrics.

[14]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[15]  G. Willems,et al.  Small sample corrections for LTS and MCD , 2002 .

[16]  Graciela Boente,et al.  Robust Multivariate Tolerance Regions: Influence Function and Monte Carlo Study , 2008, Technometrics.

[17]  Alessio Farcomeni,et al.  Error rates for multivariate outlier detection , 2011, Comput. Stat. Data Anal..

[18]  Giorgio Vittadini,et al.  Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions , 2012, J. Classif..

[19]  B. Chakraborty On Affine Equivariant Multivariate Quantiles , 2001 .

[20]  A. Atkinson,et al.  Finding an unknown number of multivariate outliers , 2009 .

[21]  Ken Stout Location and Scatter , 1985 .

[22]  A. Tamhane,et al.  Multiple Comparison Procedures , 2009 .

[23]  Ursula Gather,et al.  The largest nonindentifiable outlier: a comparison of multivariate simultaneous outlier identification rules , 2001 .

[24]  Dankmar Böhning,et al.  The lower bound method in probit regression , 1999 .

[25]  M. Jhun,et al.  Asymptotics for the minimum covariance determinant estimator , 1993 .

[26]  N. José Alberto Vargas,et al.  Robust Estimation in Multivariate Control Charts for Individual Observations , 2003 .

[27]  Ruben H. Zamar,et al.  Diagnosing Multivariate Outliers Detected by Robust Estimators , 2009 .

[28]  Catherine Dehon,et al.  Influence functions of the Spearman and Kendall correlation measures , 2010, Stat. Methods Appl..

[29]  C. Croux,et al.  Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator , 1999 .

[30]  Douglas M. Hawkins,et al.  Improved Feasible Solution Algorithms for High Breakdown Estimation , 1999 .

[31]  H. P. Lopuhaä ASYMPTOTICS OF REWEIGHTED ESTIMATORS OF MULTIVARIATE LOCATION AND SCATTER , 1999 .

[32]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[33]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .