Trimming Tools in Exploratory Data Analysis

Exploratory graphical tools based on trimming are proposed for detecting main clusters in a given dataset. The trimming is obtained by resorting to trimmed k-means methodology. The analysis always reduces to the examination of real valued curves, even in the multivariate case. As the technique is based on a robust clustering criterium, it is able to handle the presence of different kinds of outliers. An algorithm is proposed to carry out this (computer intensive) method. As with classical k-means, the method is specially oriented to mixtures of spherical distributions. A possible generalization is outlined to overcome this drawback.

[1]  J. A. Cuesta-Albertos,et al.  Trimmed $k$-means: an attempt to robustify quantizers , 1997 .

[2]  A. Izenman,et al.  Philatelic Mixtures and Multimodal Densities , 1988 .

[3]  J. Hartigan Asymptotic Distributions for Clustering Criteria , 1978 .

[4]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[5]  David L. Woodruff,et al.  Identification of Outliers in Multivariate Data , 1996 .

[6]  I. Good,et al.  Density Estimation and Bump-Hunting by the Penalized Likelihood Method Exemplified by Scattering and Meteorite Data , 1980 .

[7]  Graphical Detection of Regression Outliers and Mixtures , 1999 .

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  C. Matrán,et al.  A central limit theorem for multivariate generalized trimmed $k$-means , 1999 .

[10]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[11]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[12]  David L. Woodru A Synthesis of Outlier Detection and Cluster Identi ̄ cation , 1999 .

[13]  G. Sawitzki,et al.  Excess Mass Estimates and Tests for Multimodality , 1991 .

[14]  H. Riedwyl,et al.  Multivariate Statistics: A Practical Approach , 1988 .

[15]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[16]  Regina Y. Liu,et al.  Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh) , 1999 .

[17]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[18]  A. Gordaliza,et al.  On the breakdown point of multivariate location estimators based on trimming procedures , 1991 .

[19]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[20]  D. Hawkins Multivariate Statistics: A Practical Approach , 1990 .

[21]  C. Matrán,et al.  Asymptotics for trimmed k-means and associated tolerance zones 1 Research partially supported by the , 1999 .

[22]  J. Hartigan,et al.  Percentage Points of a Test for Clusters , 1969 .