A Cluster-Based Outlier Detection Scheme for Multivariate Data

Detection power of the squared Mahalanobis distance statistic is significantly reduced when several outliers exist within a multivariate dataset of interest. To overcome this masking effect, we propose a computer-intensive cluster-based approach that incorporates a reweighted version of Rousseeuw’s minimum covariance determinant method with a multi-step cluster-based algorithm that initially filters out potential masking points. Compared to the most robust procedures, simulation studies show that our new method is better for outlier detection. Additional real data comparisons are given. Supplementary materials for this article are available online.

[1]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[2]  Small-Sample Correction , 2014 .

[3]  Michael Pokojovy,et al.  A Multistep, Cluster-Based Multivariate Chart for Retrospective Monitoring of Individuals , 2009 .

[4]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[5]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[6]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[7]  William H. Woodall,et al.  High breakdown estimation methods for Phase I multivariate control charts , 2007, Qual. Reliab. Eng. Int..

[8]  D. Hawkins Multivariate Statistics: A Practical Approach , 1990 .

[9]  Douglas M. Hawkins,et al.  Improved Feasible Solution Algorithms for High Breakdown Estimation , 1999 .

[10]  G. Willems,et al.  Small sample corrections for LTS and MCD , 2002 .

[11]  Dankmar Böhning,et al.  The lower bound method in probit regression , 1999 .

[12]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[13]  H. Riedwyl,et al.  Multivariate Statistics: A Practical Approach , 1988 .

[14]  Patrick D. Spagon Statistical quality assurance methods for engineers , 1998 .

[15]  W. Härdle Nonparametric and Semiparametric Models , 2004 .

[16]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[17]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[18]  M. Wand,et al.  ASYMPTOTICS FOR GENERAL MULTIVARIATE KERNEL DENSITY DERIVATIVE ESTIMATORS , 2011 .

[19]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[20]  C. Croux,et al.  Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator , 1999 .

[21]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[22]  Nola D. Tracy,et al.  Multivariate Control Charts for Individual Observations , 1992 .

[23]  S. J. Wierda Multivariate statistical process control—recent results and directions for future research , 1994 .

[24]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[25]  Andrea Cerioli,et al.  Multivariate Outlier Detection With High-Breakdown Estimators , 2010 .