The TCLUST Approach to Robust Cluster Analysis

A new method for performing robust clustering is proposed. The method is designed with the aim of fitting clusters with different scatters and weights. A proportion α of contaminating data points is also allowed. Restrictions on the ratio between the maximum and the minimum eigenvalues of the groups scatter matrices are introduced. These restrictions make the problem to be well-defined guaranteeing the existence and the consistency of the sample estimators to the population parameters. ∗Research partially supported by Ministerio de Ciencia y Tecnoloǵıa and FEDER grant MTM2005-08519C02-01 and by Consejeŕıa de Educación y Cultura de la Junta de Castilla y León grant PAPIJCL VA102A06. †Departamento de Estad́ıstica e Investigación Operativa. Facultad de Ciencias. Universidad de Valladolid. 47002, Valladolid. Spain. 1 The method covers a wide range of clustering approaches, which arise depending on the strength of the chosen restrictions. Our proposal includes an algorithm for approximately solving the sample problem which takes advantage of the Dykstra’s algorithm.

[1]  María Teresa Gallegos,et al.  Maximum Likelihood Clustering with Outliers , 2002 .

[2]  M. Gallegos,et al.  A robust method for cluster analysis , 2005, math/0504513.

[3]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[4]  Luis Angel García-Escudero,et al.  Trimming Tools in Exploratory Data Analysis , 2003 .

[5]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[6]  Luis Angel García-Escudero,et al.  The importance of the scales in heterogeneous robust clustering , 2007, Comput. Stat. Data Anal..

[7]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[8]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[9]  Steven M. Lalonde,et al.  A First Course in Multivariate Statistics , 1997, Technometrics.

[10]  R. Maronna,et al.  Multivariate Clustering Procedures with Variable Metrics , 1974 .

[11]  J. A. Cuesta-Albertos,et al.  Trimmed $k$-means: an attempt to robustify quantizers , 1997 .

[12]  C. Matrán,et al.  A central limit theorem for multivariate generalized trimmed $k$-means , 1999 .

[13]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[14]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[15]  R. Dykstra An Algorithm for Restricted Least Squares Regression , 1983 .

[16]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[17]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[18]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[19]  Donald Goldfarb,et al.  A numerically stable dual method for solving strictly convex quadratic programs , 1983, Math. Program..

[20]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[21]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[22]  Teresa GallegosFakult Robust clustering under general normal assumptionsMar , 2022 .

[23]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  P. Deb Finite Mixture Models , 2008 .