Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration

Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors).

[1]  Guy Baele,et al.  The Genealogical Population Dynamics of HIV-1 in a Large Transmission Chain: Bridging within and among Host Evolutionary Rates , 2014, PLoS Comput. Biol..

[2]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[3]  Daniele Silvestro,et al.  Bayesian estimation of speciation and extinction from incomplete fossil occurrence data. , 2014, Systematic biology.

[4]  J. Huelsenbeck,et al.  The fossilized birth–death process for coherent calibration of divergence-time estimates , 2013, Proceedings of the National Academy of Sciences.

[5]  Tanja Stadler,et al.  Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth–death SIR model , 2013, Journal of The Royal Society Interface.

[6]  Chieh-Hsi Wu Bayesian approaches to Model Uncertainty in Phylogenetics , 2014 .

[7]  Erik M. Volz,et al.  Inferring the Source of Transmission with Phylogenetic Data , 2013, PLoS Comput. Biol..

[8]  C. G. Schrago,et al.  Combining fossil and molecular data to date the diversification of New World Primates , 2013, Journal of evolutionary biology.

[9]  Jacco Wallinga,et al.  Relating Phylogenetic Trees to Transmission Trees of Infectious Disease Outbreaks , 2013, Genetics.

[10]  David Welch,et al.  Recursive algorithms for phylogenetic tree counting , 2013, Algorithms for Molecular Biology.

[11]  Mirjam Kretzschmar,et al.  Infectious disease transmission as a forensic problem: who infected whom? , 2013, Journal of The Royal Society Interface.

[12]  Hannah M. Wood,et al.  Treating fossils as terminal taxa in divergence time estimation reveals ancient vicariance patterns in the palpimanoid spiders. , 2013, Systematic biology.

[13]  G. Didier,et al.  The reconstructed evolutionary process with the fossil record. , 2012, Journal of theoretical biology.

[14]  S. Bonhoeffer,et al.  Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) , 2012, Proceedings of the National Academy of Sciences.

[15]  Seraina Klopfstein,et al.  A Total-Evidence Approach to Dating with Fossils, Applied to the Early Radiation of the Hymenoptera , 2012, Systematic biology.

[16]  M. Laurin Recent progress in paleontological methods for dating the Tree of Life , 2012, Front. Gene..

[17]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[18]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[19]  Alexei J. Drummond,et al.  Calibrated Tree Priors for Relaxed Phylogenetics and Divergence Time Estimation , 2011, Systematic biology.

[20]  Beda Joos,et al.  Estimating the basic reproductive number from viral sequence data. , 2012, Molecular biology and evolution.

[21]  A. Pyron,et al.  Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia. , 2011, Systematic biology.

[22]  T. Stadler Sampling-through-time in birth-death trees. , 2010, Journal of theoretical biology.

[23]  Nicolas Lartillot,et al.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating , 2009, Bioinform..

[24]  S. Ho,et al.  Accounting for calibration uncertainty in phylogenetic estimation of evolutionary divergence times. , 2009, Systematic biology.

[25]  Richard D Wilkinson,et al.  Estimating primate divergence times by using conditioned birth-and-death processes. , 2009, Theoretical population biology.

[26]  Ziheng Yang,et al.  Inferring speciation times under an episodic molecular clock. , 2007, Systematic biology.

[27]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[28]  K. Holsinger,et al.  Polytomies and Bayesian phylogenetic inference. , 2005, Systematic biology.

[29]  Stéphane Hué,et al.  Genetic analysis reveals the complex structure of HIV-1 transmission within defined risk groups. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  O. Pybus,et al.  Unifying the Epidemiological and Evolutionary Dynamics of Pathogens , 2004, Science.

[31]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[32]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[33]  S. Tavaré,et al.  Using the fossil record to estimate the age of the last common ancestor of extant primates , 2002, Nature.

[34]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[35]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[36]  O. Pybus,et al.  The Epidemic Behavior of the Hepatitis C Virus , 2001, Science.

[37]  J. Margolick,et al.  Consistent Viral Evolutionary Changes Associated with the Progression of Human Immunodeficiency Virus Type 1 Infection , 1999, Journal of Virology.

[38]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[39]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[40]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[41]  Michael J. Sanderson,et al.  A Nonparametric Approach to Estimating Divergence Times in the Absence of Rate Constancy , 1997 .

[42]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[43]  John A. Swets,et al.  Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers , 1996 .

[44]  M. Foote On the probability of ancestors in the fossil record , 1996, Paleobiology.

[45]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[46]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[47]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[48]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.