Capturing heterotachy through multi-gamma site models

Most methods for performing a phylogenetic analysis based on sequence alignments of gene data assume that the mechanism of evolution is constant through time. It is recognised that some sites do evolve somewhat faster than others, and this can be captured using a (gamma) rate heterogeneity model. Further, some species have shorter replication times than others, and this results in faster rates of substitution in some lineages. This feature of lineage specific rate variation can be captured to some extent, by using relaxed clock models. However, it is also clear that there are additional poorly characterised features of sequence data that can sometimes lead to extreme differences in lineage specific rates. This variation is poorly captured by constant time reversible substitution models. The significance of extreme lineage specific rate differences is that they lead both to errors in reconstructing evolutionary relationships as well as biased estimates for the age of ancestral nodes. We propose a new model that allows gamma rate heterogeneity to change on branches, thus offering a more realistic model of sequence evolution. It adds negligible computational cost to likelihood calculations. We illustrate its effectiveness with an example of green algae and land-plants. For many real world data sets, we find a much better fit with multi-gamma sites models as well as substantial differences in ancestral node date estimates.

[1]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[2]  Andrew Rambaut,et al.  Heterotachy and tree building: a case study with plastids and eubacteria. , 2006, Molecular biology and evolution.

[3]  Remco R. Bouckaert,et al.  DensiTree 2: Seeing Trees Through the Forest , 2014, bioRxiv.

[4]  Simon Whelan,et al.  Phylogenetic substitution models for detecting heterotachy during plastid evolution. , 2011, Molecular biology and evolution.

[5]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[6]  Tanja Stadler,et al.  Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration , 2014, PLoS Comput. Biol..

[7]  Remco Bouckaert,et al.  Evolutionary Rates and Hbv: Issues of Rate Estimation with Bayesian Molecular Methods , 2013, Antiviral therapy.

[8]  Remco R. Bouckaert,et al.  Bayesian Evolutionary Analysis with BEAST , 2015 .

[9]  Y. Inagaki,et al.  Testing for differences in rates-across-sites distributions in phylogenetic subtrees. , 2002, Molecular biology and evolution.

[10]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[11]  H. Schneider,et al.  Ferns diversified in the shadow of angiosperms , 2004, Nature.

[12]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[13]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[14]  Ziheng Yang,et al.  Inferring speciation times under an episodic molecular clock. , 2007, Systematic biology.

[15]  Wai Lok Sibon Li,et al.  Model Averaging and Bayes Factor Calculation of Relaxed Molecular Clocks in Bayesian Phylogenetics , 2011, Molecular biology and evolution.

[16]  M. Steel,et al.  A tale of two processes. , 2005, Systematic biology.

[17]  B. Müller-Hill,et al.  On the conservation of protein sequences in evolution. , 2000, Trends in biochemical sciences.

[18]  Edward Susko,et al.  Likelihood, parsimony, and heterogeneous evolution. , 2005, Molecular biology and evolution.

[19]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[20]  W. Li,et al.  Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. , 1995, Molecular biology and evolution.

[21]  E. Cooper Overly simplistic substitution models obscure green plant phylogeny. , 2014, Trends in plant science.

[22]  J. Stiller,et al.  Comparative genomics and evolution of proteins associated with RNA polymerase II C-terminal domain. , 2005, Molecular biology and evolution.

[23]  Chris Field,et al.  Estimation of rates-across-sites distributions in phylogenetic substitution models. , 2003, Systematic biology.

[24]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[25]  M. Suchard,et al.  Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. , 2012, Molecular biology and evolution.

[26]  N. Galtier,et al.  Maximum-likelihood phylogenetic analysis under a covarion-like model. , 2001, Molecular biology and evolution.

[27]  M. Pagel,et al.  Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[28]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[29]  Frédéric Delsuc,et al.  Heterotachy and long-branch attraction in phylogenetics , 2005, BMC Evolutionary Biology.

[30]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[31]  D Penny,et al.  Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[32]  M. Suchard,et al.  Bayesian random local clocks, or one rate to rule them all , 2010, BMC Biology.

[33]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .