Robust estimation in the normal mixture model based on robust clustering

We introduce a robust estimation procedure that is based on the choice of a representative trimmed subsample through an initial robust clustering procedure, and subsequent improvements based on maximum likelihood. To obtain the initial trimming we resort to the trimmed "k"-means, a simple procedure designed for finding the core of the clusters under appropriate configurations. By handling the trimmed data as censored, maximum likelihood estimation provides in each step the location and shape of the next trimming. Data-driven restrictions on the parameters, requiring that every distribution in the mixture must be sufficiently represented in the initial clustered region, allow singularities to be avoided and guarantee the existence of the estimator. Our analysis includes robustness properties and asymptotic results as well as worked examples. Copyright (c) 2008 Royal Statistical Society.

[1]  Jean-Michel Marin,et al.  Bayesian Modelling and Inference on Mixtures of Distributions , 2005 .

[2]  Adrian E. Raftery,et al.  Inference in model-based cluster analysis , 1997, Stat. Comput..

[3]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[4]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[5]  J. A. Cuesta-Albertos,et al.  Trimmed $k$-means: an attempt to robustify quantizers , 1997 .

[6]  M. Markatou Mixture Models, Robustness, and the Weighted Likelihood Methodology , 2000, Biometrics.

[7]  S. Yakowitz,et al.  On the Identifiability of Finite Mixtures , 1968 .

[8]  Yurij Kharin Robustness in Statistical Pattern Recognition , 1996 .

[9]  R. Varga,et al.  Proof of Theorem 4 , 1983 .

[10]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[11]  N. Campbell,et al.  A multivariate study of variation in two species of rock crab of the genus Leptograpsus , 1974 .

[12]  C. Matrán Estimators based in adaptively trimming cells in the mixture model , 2007 .

[13]  C. Hennig,et al.  Dissolution point and isolation robustness: Robustness criteria for general cluster analysis methods , 2008 .

[14]  C. Hennig Breakdown points for maximum likelihood estimators of location–scale mixtures , 2004, math/0410073.

[15]  Alfio Marazzi,et al.  Adaptively truncated maximum likelihood regression with asymmetric errors , 2004 .

[16]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[17]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[18]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[19]  Luis Angel García-Escudero,et al.  The importance of the scales in heterogeneous robust clustering , 2007, Comput. Stat. Data Anal..

[20]  David B. Hitchcock,et al.  James-Stein shrinkage to improve k-means cluster analysis , 2010, Comput. Stat. Data Anal..

[21]  M. Gallegos,et al.  A robust method for cluster analysis , 2005, math/0504513.

[22]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[23]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[24]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .