Bayesian codon substitution modelling to identify sources of pathogen evolutionary rate variation

Phylodynamic reconstructions rely on a measurable molecular footprint of epidemic processes in pathogen genomes. Identifying the factors that govern the tempo and mode by which these processes leave a footprint in pathogen genomes represents an important goal towards understanding infectious disease evolution. Discriminating between synonymous and non-synonymous substitution rates is crucial for testing hypotheses about the sources of evolutionary rate variation. Here, we implement a codon substitution model in a Bayesian statistical framework to estimate absolute rates of synonymous and non-synonymous substitution in unknown evolutionary histories. To demonstrate how this model can provide critical insights into pathogen evolutionary dynamics, we adopt hierarchical phylogenetic modelling with fixed effects and apply it to two viral examples. Using within-host HIV-1 data from patients with different host genetic background and different disease progression rates, we show that viral populations undergo faster absolute synonymous substitution rates in patients with faster disease progression, probably reflecting faster replication rates. We also re-analyse rabies data from different bat species in the Americas to demonstrate that climate predicts absolute synonymous substitution rates, which can be attributed to climate-associated bat activity and viral transmission dynamics. In conclusion, our model to estimate absolute rates of synonymous and non-synonymous substitution can provide a powerful approach to investigate how host ecology can shape the tempo of pathogen evolution.

[1]  Andrew Rambaut,et al.  Pacing a small cage: mutation and RNA viruses , 2008, Trends in Ecology & Evolution.

[2]  M. Pagel,et al.  Bayesian estimation of ancestral character states on phylogenies. , 2004, Systematic biology.

[3]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[4]  Guy Baele,et al.  The Genealogical Population Dynamics of HIV-1 in a Large Transmission Chain: Bridging within and among Host Evolutionary Rates , 2014, PLoS Comput. Biol..

[5]  E. Holmes,et al.  Reduced positive selection in vector-borne RNA viruses. , 2002, Molecular biology and evolution.

[6]  M. Suchard,et al.  Hierarchical phylogenetic models for analyzing multipartite sequence data. , 2003, Systematic biology.

[7]  Alexei J Drummond,et al.  Phylogenetic evidence for deleterious mutation load in RNA viruses and its contribution to viral evolution. , 2007, Molecular biology and evolution.

[8]  S. Altizer,et al.  Variable evolutionary routes to host establishment across repeated rabies virus host shifts among bats , 2012, Proceedings of the National Academy of Sciences.

[9]  Andrew Rambaut,et al.  Evolutionary analysis of the dynamics of viral infectious disease , 2009, Nature Reviews Genetics.

[10]  M. Suchard,et al.  Impact of CCR5delta32 host genetic background and disease progression on HIV-1 intrahost evolutionary processes: efficient hypothesis testing through hierarchical phylogenetic models. , 2011, Molecular biology and evolution.

[11]  Daniel J. Wilson,et al.  Population genetic estimation of the loss of genetic diversity during horizontal transmission of HIV-1 , 2006, BMC Evolutionary Biology.

[12]  P. Lemey,et al.  Rates of Viral Evolution Are Linked to Host Geography in Bat Rabies , 2012, PLoS Pathogens.

[13]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[14]  Sergei L. Kosakovsky Pond,et al.  Purifying Selection Can Obscure the Ancient Age of Viral Lineages , 2011, Molecular biology and evolution.

[15]  S. Muse,et al.  Site-to-site variation of synonymous substitution rates. , 2005, Molecular biology and evolution.

[16]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[17]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[18]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[19]  Daniel L. Ayres,et al.  BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics , 2011, Systematic biology.

[20]  Masami Hasegawa,et al.  A viral sampling design for testing the molecular clock and for estimating evolutionary rates and divergence times , 2002, Bioinform..

[21]  B. McNab THE BEHAVIOR OF TEMPERATE CAVE BATS IN A SUBTROPICAL ENVIRONMENT , 1974 .

[22]  Sergei L. Kosakovsky Pond,et al.  Synonymous Substitution Rates Predict HIV Disease Progression as a Result of Underlying Replication Dynamics , 2007, PLoS Comput. Biol..

[23]  S. Duffy,et al.  Cell Tropism Predicts Long-term Nucleotide Substitution Rates of Mammalian RNA Viruses , 2014, PLoS pathogens.

[24]  B. Mallick VARIABLE SELECTION FOR REGRESSION MODELS , 2016 .

[25]  R. Brigham,et al.  Thermoregulatory variation among populations of bats along a latitudinal gradient , 2010, Journal of Comparative Physiology B.

[26]  M. Suchard,et al.  Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2 , 2014, PLoS pathogens.

[27]  M. Suchard,et al.  Explorer Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H 3 N 2 , 2017 .

[28]  Hirohisa Kishino,et al.  Estimating absolute rates of synonymous and nonsynonymous nucleotide substitution in order to characterize natural selection and date species divergences. , 2004, Molecular biology and evolution.

[29]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[30]  E. Holmes,et al.  The Evolution and Emergence of RNA Viruses , 2009 .