Finding an unknown number of multivariate outliers

Summary.  We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests using other robust Mahalanobis distances show the good size and high power of our procedure. We also provide a unification of results on correction factors for estimation from truncated samples.

[1]  M. Kendall Theoretical Statistics , 1956, Nature.

[2]  G. M. Tallis Elliptical and Radial Truncation in Normal Populations , 1963 .

[3]  William C. Guenther,et al.  An Easy Method for Obtaining Percentage Points of Order Statistics , 1977 .

[4]  Steven J. Schwager,et al.  Detection of Multivariate Normal Outliers , 1982 .

[5]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[6]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[7]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[8]  H. Riedwyl,et al.  Multivariate Statistics: A Practical Approach , 1988 .

[9]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[10]  D. Hawkins Multivariate Statistics: A Practical Approach , 1990 .

[11]  D. G. Simpson,et al.  Unmasking Multivariate Outliers and Leverage Points: Comment , 1990 .

[12]  P. L. Davies,et al.  The asymptotics of Rousseeuw's minimum volume ellipsoid estimator , 1992 .

[13]  A. Hadi Identifying Multiple Outliers in Multivariate Data , 1992 .

[14]  J. Simonoff,et al.  Procedures for the Identification of Multiple Outliers in Linear Models , 1993 .

[15]  M. Jhun,et al.  Asymptotics for the minimum covariance determinant estimator , 1993 .

[16]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .

[17]  A. Hadi A Modification of a Method for the Detection of Outliers in Multivariate Samples , 1994 .

[18]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  Ursula Gather,et al.  The Masking Breakdown Point of Multivariate OutlierIdenti cation , 1997 .

[21]  A note on invariance of multiple tests , 1997 .

[22]  Ursula Gather,et al.  The Masking Breakdown Point of Multivariate Outlier Identification Rules , 1999 .

[23]  C. Croux,et al.  Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator , 1999 .

[24]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[25]  H. P. Lopuhaä ASYMPTOTICS OF REWEIGHTED ESTIMATORS OF MULTIVARIATE LOCATION AND SCATTER , 1999 .

[26]  C. Croux,et al.  Principal Component Analysis Based on Robust Estimators of the Covariance or Correlation Matrix: Influence Functions and Efficiencies , 2000 .

[27]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[28]  M. Hubert Multivariate outlier detection and robust covariance matrix estimation - Discussion , 2001 .

[29]  Francisco J. Prieto,et al.  Multivariate Outlier Detection and Robust Covariance Matrix Estimation , 2001, Technometrics.

[30]  D. Montgomery,et al.  A comparative analysis of multiple outlier detection procedures in the linear regression model , 2001 .

[31]  G. Willems,et al.  Small sample corrections for LTS and MCD , 2002 .

[32]  Tena I. Katsaounis,et al.  Exploring Multivariate Data With the Forward Search , 2006 .

[33]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[34]  Luis Angel García-Escudero,et al.  Generalized Radius Processes for Elliptically Contoured Distributions , 2005 .

[35]  Brenton R. Clarke,et al.  An adaptive trimmed likelihood algorithm for identification of multivariate outliers , 2006 .

[36]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[37]  Marco Riani,et al.  Distribution Theory and Simulations for Tests of Outliers in Regression , 2006 .

[38]  Anthony C. Atkinson,et al.  Exploratory tools for clustering multivariate data , 2007, Comput. Stat. Data Anal..