论文信息 - Using Clustering and Robust Estimators to Detect Outliers in Multivariate Data.

Using Clustering and Robust Estimators to Detect Outliers in Multivariate Data.

Outlier identification is important in many applications of multivariate analysis. Either because there is some specific interest in finding anomalous observations or as a pre-processing task before the application of some multivariate method, in order to preserve the results from possible harmful effects of those observations. It is also of great interest in discriminant analysis if, when predicting group membership, one wants to have the possibility of labelling an observation as ”does not belong to any of the available groups”. The identification of outliers in multivariate data is usually based on Mahalanobis distance. The use of robust estimates of the mean and the covariance matrix is advised in order to avoid the masking effect (Rousseeuw and von Zomeren, 1990; Rocke and Woodruff, 1996; Becker and Gather, 1999). However, the performance of these rules is still highly dependent of multivariate normality of the bulk of the data. The aim of the method here described is to remove this dependency. The first version of this method appeared in Santos-Pereira and Pires (2002). In this talk we discuss some refinements and also the relation with a recently proposed similar method (Hardin and Rocke, 2004).

Carla M. Santos-Pereira | Ana M. Pires | Carla M. Santos-Pereira | A. Pires

[1] P. Rousseeuw,et al. Unmasking Multivariate Outliers and Leverage Points , 1990 .

[2] David L. Woodruff,et al. Identification of Outliers in Multivariate Data , 1996 .

[3] Ruben H. Zamar,et al. Robust Estimates of Location and Dispersion for High-Dimensional Datasets , 2002, Technometrics.

[4] Ursula Gather,et al. The Masking Breakdown Point of Multivariate Outlier Identification Rules , 1999 .

[5] David M. Rocke,et al. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator , 2004, Comput. Stat. Data Anal..

[6] Carla M. Santos-Pereira,et al. Detection of Outliers in Multivariate Data: A Method Based on Clustering and Robust Estimators , 2002, COMPSTAT.

[7] J. P. Park. The Identification Of Multiple Outliers , 2000 .

[8] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[9] P. Rousseeuw. Multivariate estimation with high breakdown point , 1985 .

[10] A. Raftery,et al. Model-based Gaussian and non-Gaussian clustering , 1993 .