The time machine: a simulation approach for stochastic trees

In this paper, we consider a simulation technique for stochastic trees. One of the most important areas in computational genetics is the calculation and subsequent maximization of the likelihood function associated with such models. This typically consists of using importance sampling and sequential Monte Carlo techniques. The approach proceeds by simulating the tree, backward in time from observed data, to a most recent common ancestor. However, in many cases, the computational time and variance of estimators are often too high to make standard approaches useful. In this paper, we propose to stop the simulation, subsequently yielding biased estimates of the likelihood surface. The bias is investigated from a theoretical point of view. Results from simulation studies are also given to investigate the balance between loss of accuracy, saving in computing time and variance reduction.

[1]  Yee Whye Teh,et al.  An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering , 2008, NIPS.

[2]  R. Griffiths Exact sampling distributions from the infinite neutral alleles model , 1979, Advances in Applied Probability.

[3]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[4]  A. Doucet,et al.  Exponential forgetting and geometric ergodicity for optimal filtering in general state-space models , 2005 .

[5]  M. De Iorio,et al.  Importance sampling on coalescent histories. II: Subdivided population models , 2004, Advances in Applied Probability.

[6]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[7]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[8]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[9]  R. Griffiths,et al.  Genealogical-tree probabilities in the infinitely-many-site model , 1989, Journal of mathematical biology.

[10]  P. Moral,et al.  On Adaptive Sequential Monte Carlo Methods , 2008 .

[11]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[12]  S. Tavaré,et al.  Ancestral Inference in Population Genetics , 1994 .

[13]  Gersende Fort,et al.  Convergence of the Monte Carlo expectation maximization for curved exponential families , 2003 .

[14]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[15]  P. Fearnhead,et al.  Postprocessing of Genealogical Trees , 2007, Genetics.

[16]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[17]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[18]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[19]  Jimmy Olsson,et al.  Asymptotic properties of particle filter-based maximum likelihood estimators for state space models , 2008 .

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Yuguo Chen,et al.  Stopping‐time resampling for sequential Monte Carlo methods , 2005 .

[22]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[23]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[24]  C. Simulating Probability Distributions in the Coalescent * , 2022 .

[25]  Peter Donnelly,et al.  Genealogical processes for Fleming-Viot models with selection and recombination , 1999 .

[26]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  P. Donnelly,et al.  Inference in molecular population genetics , 2000 .

[28]  P. Donnelly,et al.  Estimating recombination rates from population genetic data. , 2001, Genetics.

[29]  R. Douc,et al.  Asymptotic properties of the maximum likelihood estimator in autoregressive models with Markov regime , 2004, math/0503681.

[30]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[31]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[32]  S. Ethier,et al.  The Infinitely-Many-Sites Model as a Measure-Valued Diffusion , 1987 .

[33]  A. Y. Mitrophanov,et al.  Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[34]  S. J. Koopman Discussion of `Particle Markov chain Monte Carlo methods – C. Andrieu, A. Doucet and R. Holenstein’ [Review of: Particle Markov chain Monte Carlo methods] , 2010 .

[35]  O. Cappé,et al.  Sequential Monte Carlo smoothing with application to parameter estimation in nonlinear state space models , 2006, math/0609514.

[36]  S. Tavaré,et al.  Unrooted genealogical tree probabilities in the infinitely-many-sites model. , 1995, Mathematical biosciences.

[37]  Transient distribution of the number of segregating sites in a neutral infinite-sites model with no recombination , 1981 .

[38]  M. Stephens,et al.  Inference Under the Coalescent , 2004 .

[39]  C. Wiuf Consistency of estimators of population scaled parameters using composite likelihood , 2006, Journal of mathematical biology.

[40]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[41]  F. Gland,et al.  STABILITY AND UNIFORM APPROXIMATION OF NONLINEAR FILTERS USING THE HILBERT METRIC AND APPLICATION TO PARTICLE FILTERS1 , 2004 .

[42]  Paul Fearnhead,et al.  Consistency of estimators of the population-scaled recombination rate. , 2003, Theoretical population biology.

[43]  Pierre L'Ecuyer,et al.  Efficient Monte Carlo and Quasi - Monte Carlo Option Pricing Under the Variance Gamma Model , 2006, Manag. Sci..

[44]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[45]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.