Finding the Number of Disparate Clusters with Background Contamination

The Forward Search is used in an exploratory manner, with many random starts, to indicate the number of clusters and their membership in continuous data. The prospective clusters can readily be distinguished from background noise and from other forms of outliers. A confirmatory Forward Search, involving control on the sizes of statistical tests, establishes precise cluster membership. The method performs as well as robust methods such as TCLUST. However, it does not require prior specification of the number of clusters, nor of the level of trimming of outliers. In this way it is “user friendly”.

[1]  Francesca Torti,et al.  FSDA: A MATLAB toolbox for robust analysis and interactive data exploration , 2012 .

[2]  Gianluca Morelli,et al.  A comparison of different classification methods , 2013 .

[3]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[4]  Anthony C. Atkinson,et al.  A Parametric Framework for the Comparison of Methods of Very Robust Regression , 2014, 1405.5040.

[5]  Geoffrey J. McLachlan,et al.  Model-based clustering and classification with non-normal mixture distributions , 2013, Stat. Methods Appl..

[6]  Christian Hennig,et al.  A simulation study to compare robust clustering methods based on mixtures , 2010, Adv. Data Anal. Classif..

[7]  Luis Angel García-Escudero,et al.  A review of robust clustering methods , 2010, Adv. Data Anal. Classif..

[8]  Marco Riani,et al.  Random Start Forward Searches with Envelopes for Detecting Clusters in Multivariate Data , 2006 .

[9]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[10]  Luis Angel García-Escudero,et al.  Exploring the number of groups in robust model-based clustering , 2011, Stat. Comput..

[11]  A. Atkinson,et al.  Finding an unknown number of multivariate outliers , 2009 .

[12]  Anthony C. Atkinson,et al.  Exploratory tools for clustering multivariate data , 2007, Comput. Stat. Data Anal..

[13]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[14]  Charu C. Aggarwal,et al.  An Introduction to Cluster Analysis , 2018, Data Clustering: Algorithms and Applications.

[15]  Anthony C. Atkinson,et al.  Exploring Multivariate Data with the Forward Search , 2004 .

[16]  M. Gallegos,et al.  Trimming algorithms for clustering contaminated grouped data and their robustness , 2009, Adv. Data Anal. Classif..

[17]  C. Matr'an,et al.  A general trimming approach to robust Cluster Analysis , 2008, 0806.2976.

[18]  Luis Angel García-Escudero,et al.  tclust: An R Package for a Trimming Approach to Cluster Analysis , 2012 .

[19]  Domenico Perrotta,et al.  Robust clustering around regression lines with high density regions , 2013, Advances in Data Analysis and Classification.

[20]  Giuliano Galimberti,et al.  Classification Trees for Ordinal Responses in R: The rpartScore Package , 2012 .

[21]  Christian Hennig,et al.  Validating visual clusters in large datasets: fixed point clusters of spectral features , 2002 .

[22]  E. Fowlkes,et al.  Variable selection in clustering , 1988 .