Trimming algorithms for clustering contaminated grouped data and their robustness

We establish an affine equivariant, constrained heteroscedastic model and criterion with trimming for clustering contaminated, grouped data. We show existence of the maximum likelihood estimator, propose a method for determining an appropriate constraint, and design a strategy for finding reasonable clusterings. We finally compute breakdown points of the estimated parameters thereby showing asymptotic robustness of the method.

[1]  J. Hodges Efficiency in normal samples and tolerance of extreme values for some estimates of location , 1967 .

[2]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[3]  Anne Schroeder,et al.  Analyse d'un mélange de distributions de probabilité de même type , 1976 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[6]  Michael J. Symons,et al.  Clustering criteria and multivariate normal mixtures , 1981 .

[7]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[8]  M. J. D. Powell,et al.  Nonlinear optimization, 1981 , 1982 .

[9]  H. Bock On some significance tests in cluster analysis , 1985 .

[10]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[11]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[12]  Hans-Hermann Bock,et al.  Classification, Data Analysis, and Knowledge Organization , 1991 .

[13]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[14]  J. A. Cuesta-Albertos,et al.  Trimmed $k$-means: an attempt to robustify quantizers , 1997 .

[15]  Gunter Ritter,et al.  Outliers in statistical pattern recognition and an application to automatic chromosome classification , 1997, Pattern Recognit. Lett..

[16]  Ursula Gather,et al.  The Masking Breakdown Point of Multivariate Outlier Identification Rules , 1999 .

[17]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[18]  David L. Woodru A Synthesis of Outlier Detection and Cluster Identi ̄ cation , 1999 .

[19]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[20]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[21]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[22]  Hans-Georg Bartel,et al.  Exploring Roman Brick and Tile by Cluster Analysis with Validation of Results , 2002 .

[23]  Wolfgang Gaul,et al.  "Classification, Automation, and New Media" , 2002 .

[24]  John D. Kalbfleisch,et al.  Testing for a finite mixture model with two components , 2004 .

[25]  C. Mecklin,et al.  An Appraisal and Bibliography of Tests for Multivariate Normality , 2004 .

[26]  M. Gallegos,et al.  A robust method for cluster analysis , 2005, math/0504513.

[27]  G. Ritter Model{based clustering with the assignment problem , 2006 .

[28]  Peter Filzmoser,et al.  Robust fitting of mixtures using the trimmed likelihood estimator , 2007, Comput. Stat. Data Anal..

[29]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[30]  Alfonso Gordaliza Ramos,et al.  A general trimming approach to robust cluster analysis , 2007 .

[31]  P. Deb Finite Mixture Models , 2008 .

[32]  Gunter Ritter,et al.  Using combinatorial optimization in model-based trimmed clustering with cardinality constraints , 2010, Comput. Stat. Data Anal..