A comparison of reversible jump MCMC algorithms for DNA sequence segmentation using hidden Markov models

This paper describes a Bayesian approach to determining the number of hidden states in a hidden Markov model (HMM) via reversible jump Markov chain Monte Carlo (MCMC) methods. Acceptance rates for these algorithms can be quite low, resulting in slow exploration of the posterior distribution. We consider a variety of reversible jump strategies which allow inferences to be made in discretely observed HMMs, with particular emphasis placed on the comparison of the competing strategies in terms of computational expense, algebraic complexity and performance. The methods are illustrated with an application to the segmentation of DNA sequences into compositionally homogeneous regions.

[1]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[2]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[5]  Gary A. Churchill,et al.  Hidden Markov Chains and the Analysis of Genome Structure , 1992, Comput. Chem..

[6]  C. Robert,et al.  Bayesian estimation of hidden Markov chains: a stochastic implementation , 1993 .

[7]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[8]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[9]  John M. Olin Calculating posterior distributions and modal estimates in Markov mixture models , 1996 .

[10]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[11]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[12]  Lain L. MacDonald,et al.  Hidden Markov and Other Models for Discrete- valued Time Series , 1997 .

[13]  H. Müller,et al.  Statistical methods for DNA sequence segmentation , 1998 .

[14]  Florence Muri,et al.  Modelling Bacterial Genomes Using Hidden Markov Models , 1998, COMPSTAT.

[15]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[16]  C. Robert,et al.  Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method , 2000 .

[17]  Darren J. Wilkinson,et al.  Detecting homogeneous segments in DNA sequences by using hidden Markov models , 2000 .

[18]  S. P. Brooksy,et al.  Efficient construction of reversible jump MCMC proposal dis- tributions , 2000 .

[19]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[20]  M. Stephens Dealing with label switching in mixture models , 2000 .

[21]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[22]  Olivier Cappé,et al.  Ten years of HMMs , 2001 .

[23]  Christian P. Robert,et al.  Reversible Jump MCMC Converging to Birth-and-Death MCMC and More General Continuous Time Samplers , 2001 .

[24]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[25]  P. Green,et al.  Bayesian Analysis of Poisson Mixtures , 2002 .