Robust mixture modelling using the t distribution

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.

[1]  Richard D. De Veaux,et al.  Robust estimation of a normal mixture , 1990 .

[2]  Chuanhai Liu ML Estimation of the MultivariatetDistribution and the EM Algorithm , 1997 .

[3]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[4]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[5]  Jean-Michel Jolion,et al.  Robust Clustering with Applications in Computer Vision , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[7]  M. Markatou Mixture Models, Robustness, and the Weighted Likelihood Methodology , 2000, Biometrics.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  N. Campbell Mixture models and atypical values , 1984 .

[10]  F. Hampel Robust estimation: A condensed partial survey , 1973 .

[11]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[12]  Reto Meuli,et al.  Robust parameter estimation of intensity distributions for brain magnetic resonance images , 1998, IEEE Transactions on Medical Imaging.

[13]  Xinhua Zhuang,et al.  Gaussian mixture density modeling, decomposition, and applications , 1996, IEEE Trans. Image Process..

[14]  Peter J. Rousseeuw,et al.  Fuzzy clustering using scatter matrices , 1996 .

[15]  Geoffrey J. McLachlan,et al.  Robust Cluster Analysis via Mixtures of Multivariate t-Distributions , 1998, SSPR/SPR.

[16]  D. Rubin,et al.  Parameter expansion to accelerate EM: The PX-EM algorithm , 1998 .

[17]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[18]  X. Tu,et al.  On the rate of convergence of the ECME algorithm for multiple regression models with t-distributed errors , 1997 .

[19]  Yurij Kharin Robustness in Statistical Pattern Recognition , 1996 .

[20]  N. Campbell,et al.  A multivariate study of variation in two species of rock crab of the genus Leptograpsus , 1974 .

[21]  I. R. Dunsmore,et al.  Statistical Prediction Analysis: Mean coverage tolerance prediction , 1975 .

[22]  David E. Tyler,et al.  A curious likelihood identity for the multivariate t-distribution , 1994 .

[23]  Dankmar Böhning,et al.  Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping, and Others , 1999 .

[24]  Andrzej S. Kosinski,et al.  A procedure for the detection of multivariate outliers , 1998 .

[25]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[26]  Peter J Green Discussion of 'The EM algorithm - an old folk-song sung to a fast new tune'by XL Meng & D van Dyk , 1997 .

[27]  Hichem Frigui,et al.  A robust algorithm for automatic extraction of an unknown number of clusters from noisy data , 1996, Pattern Recognit. Lett..

[28]  Trevor C. Bailey,et al.  Robust classification of high-dimensional data using artificial neural networks , 1993 .

[29]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[30]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[31]  David L. Woodruff,et al.  Robust estimation of multivariate location and shape , 1997 .

[32]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[33]  D. Rubin,et al.  ML ESTIMATION OF THE t DISTRIBUTION USING EM AND ITS EXTENSIONS, ECM AND ECME , 1999 .

[34]  Douglas M. Hawkins,et al.  High-Breakdown Linear Discriminant Analysis , 1997 .

[35]  R. Wolke,et al.  Iteratively Reweighted Least Squares: Algorithms, Convergence Analysis, and Numerical Comparisons , 1988 .

[36]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[37]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[38]  Brajendra C. Sutradhar,et al.  Estimation of the parameters of a regression model with a multivariate t error variable , 1986 .

[39]  Marianthi Markatou,et al.  Weighted Likelihood Equations with Bootstrap Root Search , 1998 .