Exploratory tools for clustering multivariate data

The forward search provides a series of robust parameter estimates based on increasing numbers of observations. The resulting series of robust Mahalanobis distances is used to cluster multivariate normal data. The method depends on envelopes of the distribution of the test statistics in forward plots. These envelopes can be found by simulation; flexible polynomial approximations to the envelopes are given. New graphical tools provide methods not only of detecting clusters but also of determining their membership. Comparisons are made with mclust and k-means clustering.

[1]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[2]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[3]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[4]  D. Hawkins Multivariate Statistics: A Practical Approach , 1990 .

[5]  Ds Leslie Discussion of the article by Handcock, Raftery and Tantrum , 2007 .

[6]  P. Deb Finite Mixture Models , 2008 .

[7]  H. Riedwyl,et al.  Multivariate Statistics: A Practical Approach , 1988 .

[8]  Tena I. Katsaounis,et al.  Exploring Multivariate Data with the Forward Search , 2004, Technometrics.

[9]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[10]  A. Atkinson Fast Very Robust Methods for the Detection of Multiple Outliers , 1994 .

[11]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[12]  Maurizio Vichi,et al.  Data Analysis, Classification and the Forward Search , 2006 .

[13]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[14]  Patrick J. F. Groenen,et al.  Data Analysis, Classification and the Forward Search , 2006 .

[15]  Marco Riani,et al.  Random Start Forward Searches with Envelopes for Detecting Clusters in Multivariate Data , 2006 .

[16]  S. Zani,et al.  Robust bivariate boxplots and multiple outlier detection , 1998 .

[17]  Marco Riani,et al.  Distribution Theory and Simulations for Tests of Outliers in Regression , 2006 .

[18]  A. Hadi Identifying Multiple Outliers in Multivariate Data , 1992 .

[19]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[20]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[21]  Adrian E. Raftery,et al.  Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST , 2003, J. Classif..

[22]  Steven M. Lalonde,et al.  A First Course in Multivariate Statistics , 1997, Technometrics.

[23]  Adrian E. Raftery,et al.  MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .