CLUSTERING VIA NORMAL MIXTURE MODELSG

We consider a model-based approach to clustering, whereby each observation is assumed to have arisen from an underlying mixture of a nite number of distributions. The number of components in this mixture model corresponds to the number of clusters to be imposed on the data. A common assumption is to take the component distributions to be multivariate normal with perhaps some restrictions on the component covari-ance matrices. The model can be tted to the data using maximum likelihood implemented via the EM algorithm. There is a number of computational issues associated with the tting, including the speciication of initial starting points for the EM algorithm and the carrying out of tests for the number of components in the nal version of the model. We shall discuss some of these problems and describe an algorithm that attempts to handle them automatically.