The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data.

Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results.

[1]  O. Pybus,et al.  Inference of viral evolutionary rates from molecular sequences. , 2003, Advances in parasitology.

[2]  A. Rambaut,et al.  Real-time characterization of the molecular epidemiology of an influenza pandemic , 2013, Biology Letters.

[3]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[4]  A. Rambaut,et al.  Episodic Sexual Transmission of HIV Revealed by Molecular Phylodynamics , 2008, PLoS medicine.

[5]  S. Ho,et al.  Skyline‐plot methods for estimating demographic history from nucleotide sequences , 2011, Molecular ecology resources.

[6]  Geoff Nicholls,et al.  Using Temporally Spaced Sequences to Simultaneously Estimate Migration Rates, Mutation Rate and Population Sizes in Measurably Evolving Populations , 2004, Genetics.

[7]  S. Ho,et al.  Tree imbalance causes a bias in phylogenetic estimation of evolutionary timescales using heterochronous sequences , 2015, Molecular ecology resources.

[8]  A. Rodrigo,et al.  The inference of stepwise changes in substitution rates using serial sequence samples. , 2001, Molecular biology and evolution.

[9]  Sebastián Duchêne,et al.  Molecular‐clock methods for estimating evolutionary rates and timescales , 2014, Molecular ecology.

[10]  P. Roques,et al.  Island Biogeography Reveals the Deep History of SIV , 2010, Science.

[11]  E. Holmes,et al.  Validation of high rates of nucleotide substitution in geminiviruses: phylogenetic evidence from East African cassava mosaic viruses. , 2009, The Journal of general virology.

[12]  E. Holmes,et al.  Phylogenetic Evidence for Rapid Rates of Molecular Evolution in the Single-Stranded DNA Begomovirus Tomato Yellow Leaf Curl Virus , 2007, Journal of Virology.

[13]  Joel O. Wertheim,et al.  Dating the Age of the SIV Lineages That Gave Rise to HIV-1 and HIV-2 , 2009, PLoS Comput. Biol..

[14]  S. Ho,et al.  Elevated substitution rates estimated from ancient DNA sequences , 2007, Biology Letters.

[15]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[16]  Beda Joos,et al.  Estimating the basic reproductive number from viral sequence data. , 2012, Molecular biology and evolution.

[17]  Steven Wolinsky,et al.  Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960 , 2008, Nature.

[18]  Alexei J. Drummond,et al.  Phylogenetic and epidemic modeling of rapidly evolving infectious diseases , 2011, Infection, Genetics and Evolution.

[19]  F. Allendorf,et al.  The evolutionary rate of tuatara revisited. , 2009, Trends in genetics : TIG.

[20]  E. Holmes,et al.  Evolutionary History and Attenuation of Myxoma Virus on Two Continents , 2012, PLoS pathogens.

[21]  R. Lanfear,et al.  The impact of calibration and clock-model choice on molecular estimates of divergence times. , 2014, Molecular phylogenetics and evolution.

[22]  Edward C. Holmes,et al.  Long-Term Evolution of the Luteoviridae: Time Scale and Mode of Virus Speciation , 2010, Journal of Virology.

[23]  Thomas J Naughton,et al.  Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified , 2006, BMC Evolutionary Biology.

[24]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[25]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[26]  A. Lapedes,et al.  Timing the ancestor of the HIV-1 pandemic strains. , 2000, Science.

[27]  Marc A Suchard,et al.  Using Time-Structured Data to Estimate Evolutionary Rates of Double-Stranded DNA Viruses , 2010, Molecular biology and evolution.

[28]  Natália T. Marques,et al.  The evolutionary rate of citrus tristeza virus ranks among the rates of the slowest RNA viruses. , 2012, The Journal of general virology.

[29]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[30]  Gavin J. D. Smith,et al.  Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic , 2009, Nature.

[31]  H. Poinar,et al.  Time dependency of molecular rates in ancient DNA data sets, a sampling artifact? , 2009, Systematic biology.

[32]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[33]  E. Holmes,et al.  Hantavirus evolution in relation to its rodent and insectivore hosts: no evidence for codivergence. , 2008, Molecular biology and evolution.

[34]  E. Holmes,et al.  High rates of molecular evolution in hantaviruses. , 2008, Molecular biology and evolution.

[35]  A. Rodrigo,et al.  Reconstructing genealogies of serial samples under the assumption of a molecular clock using serial-sample UPGMA. , 2000, Molecular biology and evolution.

[36]  S. Ho,et al.  Phylogenetic estimation of timescales using ancient DNA: the effects of temporal sampling scheme and uncertainty in sample ages. , 2012, Molecular biology and evolution.

[37]  R. Lanfear,et al.  Bayesian estimation of substitution rates from ancient DNA sequences with low information content. , 2011, Systematic biology.

[38]  Masami Hasegawa,et al.  A viral sampling design for testing the molecular clock and for estimating evolutionary rates and divergence times , 2002, Bioinform..

[39]  W. Fitch,et al.  Positive Darwinian evolution in human influenza A viruses. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[40]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[41]  M. Navascués,et al.  Combining contemporary and ancient DNA in population genetic and phylogeographical studies , 2010, Molecular ecology resources.

[42]  M. Navascués,et al.  Elevated substitution rate estimates from ancient DNA: model violation and bias of Bayesian methods , 2009, Molecular ecology.

[43]  Andrew Rambaut,et al.  Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies , 2000, Bioinform..

[44]  Edward C Holmes,et al.  High rate of viral evolution associated with the emergence of carnivore parvovirus. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[45]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[46]  E. Holmes,et al.  Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates , 2014, Proceedings of the Royal Society B: Biological Sciences.

[47]  R. Sanjuán,et al.  The cost of replication fidelity in an RNA virus. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Andrew Rambaut,et al.  Evolutionary analysis of the dynamics of viral infectious disease , 2009, Nature Reviews Genetics.

[49]  Sebastián Duchêne,et al.  Simulating and detecting autocorrelation of molecular evolutionary rates among lineages , 2015, Molecular ecology resources.

[50]  Huldrych F. Günthard,et al.  Using an Epidemiological Model for Phylogenetic Inference Reveals Density Dependence in HIV Transmission , 2013, Molecular biology and evolution.