Computational methods for complex stochastic systems: a review of some alternatives to MCMC

Abstract We consider analysis of complex stochastic models based upon partial information. MCMC and reversible jump MCMC are often the methods of choice for such problems, but in some situations they can be difficult to implement; and suffer from problems such as poor mixing, and the difficulty of diagnosing convergence. Here we review three alternatives to MCMC methods: importance sampling, the forward-backward algorithm, and sequential Monte Carlo (SMC). We discuss how to design good proposal densities for importance sampling, show some of the range of models for which the forward-backward algorithm can be applied, and show how resampling ideas from SMC can be used to improve the efficiency of the other two methods. We demonstrate these methods on a range of examples, including estimating the transition density of a diffusion and of a discrete-state continuous-time Markov chain; inferring structure in population genetics; and segmenting genetic divergence data.

[1]  A. Doucet,et al.  Particle filtering for partially observed Gaussian state space models , 2002 .

[2]  A. Doucet,et al.  Particle Motions in Absorbing Medium with Hard and Soft Obstacles , 2004 .

[3]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[4]  Paul Fearnhead,et al.  Particle filters for mixture models with an unknown number of components , 2004, Stat. Comput..

[5]  W S Watkins,et al.  Origins and affinities of modern humans: a comparison of mitochondrial and nuclear genetic data. , 1995, American journal of human genetics.

[6]  Ajay Jasra,et al.  On population-based simulation for static inference , 2007, Stat. Comput..

[7]  Nicolas Chopin,et al.  Inference and model choice for sequentially ordered hidden Markov models , 2007 .

[8]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[9]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[10]  P. Fearnhead MCMC, sufficient statistics and particle filters. , 2002 .

[11]  Daniel Falush,et al.  A bimodal pattern of relatedness between the Salmonella Paratyphi A and Typhi genomes: convergence or divergence by homologous recombination? , 2006, Genome research.

[12]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[13]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[14]  P. Fearnhead,et al.  On‐line inference for hidden Markov models via particle filters , 2003 .

[15]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[16]  A. Doucet,et al.  Monte Carlo Smoothing for Nonlinear Time Series , 2004, Journal of the American Statistical Association.

[17]  M. Achtman,et al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Yi-Ching Yao Estimation of a Noisy Discrete-Time Step Function: Bayes and Empirical Bayes Approaches , 1984 .

[19]  P. Moral,et al.  Sequential Monte Carlo for Bayesian Computation , 2006 .

[20]  Yuguo Chen,et al.  Stopping‐time resampling for sequential Monte Carlo methods , 2005 .

[21]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[22]  C. J-F,et al.  THE COALESCENT , 1980 .

[23]  P. Moral,et al.  On the stability of interacting processes with applications to filtering and genetic algorithms , 2001 .

[24]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[25]  B. Delyon,et al.  Simulation of conditioned diffusion and application to parameter estimation , 2006 .

[26]  Darren J. Wilkinson,et al.  Bayesian inference for a discretely observed stochastic kinetic model , 2008, Stat. Comput..

[27]  P. Fearnhead,et al.  Exact filtering for partially observed continuous time models , 2004 .

[28]  S. Tavaré,et al.  Unrooted genealogical tree probabilities in the infinitely-many-sites model. , 1995, Mathematical biosciences.

[29]  Kent E. Holsinger,et al.  Model fitting and inference under latent equilibrium processes , 2007, Stat. Comput..

[30]  A. Gallant,et al.  Numerical Techniques for Maximum Likelihood Estimation of Continuous-Time Diffusion Processes , 2002 .

[31]  P. Fearnhead The common ancestor at a nonneutral locus , 2002, Journal of Applied Probability.

[32]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[33]  M. West,et al.  Bayesian forecasting and dynamic models , 1989 .

[34]  L. Rogers,et al.  Diffusions, Markov Processes and Martingales: References , 2000 .

[35]  Radford M. Neal Markov Chain Sampling for Non-linear State Space Models Using Embedded Hidden Markov Models , 2003, math/0305039.

[36]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[37]  Nando de Freitas,et al.  Sequential Monte Carlo in Practice , 2001 .

[38]  Daniel J. Wilson,et al.  Rapid Evolution and the Importance of Recombination to the Gastroenteric Pathogen Campylobacter jejuni , 2008, Molecular biology and evolution.

[39]  M. Pitt Smooth Particle Filters for Likelihood Evaluation and Maximisation , 2002 .

[40]  J. Crow,et al.  THE NUMBER OF ALLELES THAT CAN BE MAINTAINED IN A FINITE POPULATION. , 1964, Genetics.

[41]  R. Griffiths,et al.  Inference from gene trees in a subdivided population. , 2000, Theoretical population biology.

[42]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[43]  S. Kou,et al.  Equi-energy sampler with applications in statistical inference and statistical mechanics , 2005, math/0507080.

[44]  Dani Gamerman,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, Second Edition , 2006 .

[45]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[46]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[47]  O. Stramer,et al.  On Simulated Likelihood of Discretely Observed Diffusion Processes and Comparison to Closed-Form Approximation , 2007 .

[48]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[49]  D. Wilkinson,et al.  Bayesian Inference for Stochastic Kinetic Models Using a Diffusion Approximation , 2005, Biometrics.

[50]  M. Stephens Times on trees, and the age of an allele. , 2000, Theoretical population biology.

[51]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[52]  David Madigan,et al.  A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets , 2003, Data Mining and Knowledge Discovery.

[53]  P. Donnelly,et al.  Approximate likelihood methods for estimating local recombination rates , 2002 .

[54]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[55]  S. L. Scott Bayesian Analysis of a Two-State Markov Modulated Poisson Process , 1999 .

[56]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[57]  Paul Fearnhead,et al.  Exact Bayesian curve fitting and signal segmentation , 2005, IEEE Transactions on Signal Processing.

[58]  H. Künsch Recursive Monte Carlo filters: algorithms and theoretical analysis , 2005 .

[59]  Jun S. Liu,et al.  Metropolized independent sampling with comparisons to rejection sampling and importance sampling , 1996, Stat. Comput..

[60]  Peter Donnelly,et al.  A countable representation of the Fleming-Viot measure-valued diffusion , 1996 .

[61]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[62]  Peter Donnelly,et al.  Assessing population differentiation and isolation from single‐nucleotide polymorphism data , 2002 .

[63]  M. De Iorio,et al.  Importance sampling on coalescent histories. II: Subdivided population models , 2004, Advances in Applied Probability.

[64]  G. Roberts,et al.  On inference for partially observed nonlinear diffusion models using the Metropolis–Hastings algorithm , 2001 .

[65]  Jun S. Liu,et al.  Mixture Kalman filters , 2000 .

[66]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[67]  Jun S. Liu,et al.  Rejection Control and Sequential Importance Sampling , 1998 .

[68]  L. Rogers,et al.  Diffusions, Markov processes, and martingales , 1979 .

[69]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[70]  Darren J. Wilkinson,et al.  Bayesian Sequential Inference for Stochastic Kinetic Biochemical Network Models , 2006, J. Comput. Biol..

[71]  E. Lander,et al.  Construction of multilocus genetic linkage maps in humans. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[72]  O. Cappé,et al.  Population Monte Carlo , 2004 .

[73]  Michael A. West,et al.  Combined Parameter and State Estimation in Simulation-Based Filtering , 2001, Sequential Monte Carlo Methods in Practice.

[74]  Problems with computational methods in population ge - , 1999 .

[75]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[76]  Michael A. West,et al.  Bayesian forecasting and dynamic models (2nd ed.) , 1997 .

[77]  P. Fearnhead,et al.  Particle filters for partially observed diffusions , 2007, 0710.4245.

[78]  N. Chopin A sequential particle filter method for static models , 2002 .

[79]  R. E. Kalman,et al.  New Results in Linear Filtering and Prediction Theory , 1961 .

[80]  Jun S. Liu,et al.  Blind Deconvolution via Sequential Imputations , 1995 .

[81]  H. Kunsch Recursive Monte Carlo filters: Algorithms and theoretical analysis , 2006, math/0602211.

[82]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[83]  P Donnelly,et al.  Coalescents and genealogical structure under neutrality. , 1995, Annual review of genetics.

[84]  J. Hartigan,et al.  A Bayesian Analysis for Change Point Problems , 1993 .

[85]  T. Alderweireld,et al.  A Theory for the Term Structure of Interest Rates , 2004, cond-mat/0405293.

[86]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[87]  P. Fearnhead Using Random Quasi-Monte-Carlo Within Particle Filters, With Application to Financial Time Series , 2005 .

[88]  P. Fearnhead,et al.  Efficient Online Inference for Multiple Changepoint Problems , 2006, 2006 IEEE Nonlinear Statistical Signal Processing Workshop.

[89]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[90]  Darren J. Wilkinson Stochastic Modelling for Systems Biology , 2006 .

[91]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[92]  G. Roberts,et al.  Monte Carlo Maximum Likelihood Estimation for Discretely Observed Diffusion Processes , 2009, 0903.0290.

[93]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[94]  P. Fearnhead,et al.  Exact and computationally efficient likelihood‐based estimation for discretely observed diffusion processes (with discussion) , 2006 .

[95]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[96]  O. Stramer,et al.  Asymptotics of an Efficient Monte Carlo Estimation for the Transition Density of Diffusion Processes , 2007 .

[97]  Daniel Falush,et al.  Mismatch induced speciation in Salmonella: model and data , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[98]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[99]  A. Pedersen A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations , 1995 .

[100]  J. Hartigan,et al.  Product Partition Models for Change Point Problems , 1992 .

[101]  S. E. Ahmed,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 2008, Technometrics.

[102]  G. Peters,et al.  Monte Carlo Approximations for General State-Space Models , 1998 .

[103]  P. Fearnhead,et al.  Improved particle filter for nonlinear problems , 1999 .

[104]  M. Pitt,et al.  Filtering via Simulation: Auxiliary Particle Filters , 1999 .

[105]  Gareth O. Roberts,et al.  Non-centred parameterisations for hierarchical models and data augmentation. , 2003 .

[106]  P. Donnelly,et al.  Inference in molecular population genetics , 2000 .

[107]  P. Kloeden,et al.  Numerical Solution of Stochastic Differential Equations , 1992 .

[108]  Jean-Michel Marin,et al.  Iterated importance sampling in missing data problems , 2006, Comput. Stat. Data Anal..

[109]  P. Fearnhead,et al.  An improved particle filter for non-linear problems , 1999 .

[110]  P. Fearnhead Markov chain Monte Carlo, Sufficient Statistics, and Particle Filters , 2002 .

[111]  Pierre L'Ecuyer,et al.  A Randomized Quasi-Monte Carlo Simulation Method for Markov Chains , 2006, Oper. Res..

[112]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[113]  P. Fearnhead,et al.  An exact Gibbs sampler for the Markov‐modulated Poisson process , 2006 .

[114]  Hugh Griffiths,et al.  IEE Proceedings - Radar, Sonar and Navigation , 2004 .

[115]  The Common Ancestor At A Non-Neutral Locus , 2001 .

[116]  Paul Fearnhead,et al.  Bayesian Analysis of Isochores , 2009 .

[117]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[118]  Geir Storvik,et al.  Particle filters for state-space models with the presence of unknown static parameters , 2002, IEEE Trans. Signal Process..

[119]  P. Donnelly,et al.  Estimating recombination rates from population genetic data. , 2001, Genetics.

[120]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[121]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[122]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[123]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[124]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.