论文信息 - Bayesian analysis of finite mixture distributions using the allocation sampler

Bayesian analysis of finite mixture distributions using the allocation sampler

Finite mixture distributions are receiving more and more attention from statisticians in many different fields of research because they are a very flexible class of models. They are typically used for density estimation or to model population heterogeneity. One can think of a finite mixture distribution as grouping the observations into components from which they are assumed to have arisen. In certain settings these groups have a physical interpretation. The interest in these distributions has been boosted recently because of the ever increasing computer power available to researchers to carry out the computationally intensive tasks required in their analysis. In order to fit a finite mixture distribution taking a Bayesian approach a posterior distribution has to be evaluated. When the number of components in the model is assumed known this posterior distribution can be sampled from using methods such as Data Augmentation or Gibbs sampling (Tanner and Wong (1987) and Gelfand and Smith (1990)) and the Metropolis-Hastings algorithm (Hastings (1970)). However, the number of components in the model can also be considered an unknown and an object of inference. Richardson and Green (1997) and Stephens (2000a) both describe Bayesian methods to sample across models with different numbers of components. This enables an estimate of the posterior distribution of the number of components to be evaluated. Richardson and Green (1997) define a reversible jump Markov chain Monte Carlo (RJMCMC) sampler while Stephens (2000a) uses a Markov birth-death process approach sample from the posterior distribution. In this thesis a Markov chain Monte Carlo method, named the allocation sampler. This sampler differs from the RJMCMC method reported in Richardson and Green (1997) because the state space of the sampler is simplified by the assumption that the components' parameters and weights can be analytically integrated out of the model. This in turn has the advantage that only minimal changes are required to the sampler for mixtures of components from other parametric families. This thesis illustrates the allocation sampler's performance on both simulated and real data sets. Chapter 1 provides a background to finite mixture distributions and gives an overview of some inferential techniques that have already been used to analyse these distributions. Chapter 2 sets out the Bayesian model framework that is used throughout this thesis and defines all the required distributional results. Chapter 3 describes the allocation sampler. Chapter 4 tests the performance of the allocation sampler using simulated datasets from a collection of 15 different known mixture distributions. Chapter 5 illustrates the allocation sampler with real datasets from a number of different research fields. Chapter 6 summarises the research in the thesis and provides areas of possible future research.

A. Fearnside

[1] P. Nurmi. Mixture Models , 2008 .

[2] A. Nobile. Bayesian finite mixtures: a note on prior specification and posterior computation , 2007, 0711.0458.

[3] Agostino Nobile,et al. Bayesian finite mixtures with an unknown number of components: The allocation sampler , 2007, Stat. Comput..

[4] Loukia Meligkotsidou,et al. Bayesian multivariate Poisson mixtures with an unknown number of components , 2007, Stat. Comput..

[5] Richard A. Levine,et al. Optimizing random scan Gibbs samplers , 2006 .

[6] Adrian E. Raftery,et al. Computing Normalizing Constants for Finite Mixture Models via Incremental Mixture Importance Sampling (IMIS) , 2006 .

[7] Baibing Li. A new approach to cluster analysis: the clustering‐function‐based method , 2006 .

[8] Clare A. McGrory,et al. Variational approximations in Bayesian model selection , 2005 .

[9] Jean-Michel Marin,et al. Bayesian Modelling and Inference on Mixtures of Distributions , 2005 .

[10] Agostino Nobile,et al. On the posterior distribution of the number of components in a finite mixture , 2004, math/0503673.

[11] Zhihua Zhang,et al. Learning a multivariate Gaussian mixture model with the reversible jump MCMC algorithm , 2004, Stat. Comput..