DaTeR: error-correcting phylogenetic chronograms using relative time constraints

Abstract Motivation A chronogram is a dated phylogenetic tree whose branch lengths have been scaled to represent time. Such chronograms are computed based on available date estimates (e.g. from dated fossils), which provide absolute time constraints for one or more nodes of an input undated phylogeny, coupled with an appropriate underlying model for evolutionary rates variation along the branches of the phylogeny. However, traditional methods for phylogenetic dating cannot take into account relative time constraints, such as those provided by inferred horizontal transfer events. In many cases, chronograms computed using only absolute time constraints are inconsistent with known relative time constraints. Results In this work, we introduce a new approach, Dating Trees using Relative constraints (DaTeR), for phylogenetic dating that can take into account both absolute and relative time constraints. The key idea is to use existing Bayesian approaches for phylogenetic dating to sample posterior chronograms satisfying desired absolute time constraints, minimally adjust or ‘error-correct’ these sampled chronograms to satisfy all given relative time constraints, and aggregate across all error-corrected chronograms. DaTeR uses a constrained optimization framework for the error-correction step, finding minimal deviations from previously assigned dates or branch lengths. We applied DaTeR to a biological dataset of 170 Cyanobacterial taxa and a reliable set of 24 transfer-based relative constraints, under six different molecular dating models. Our extensive analysis of this dataset demonstrates that DaTeR is both highly effective and scalable and that its application can significantly improve estimated chronograms. Availability and implementation Freely available from https://compbio.engr.uconn.edu/software/dater/ Supplementary information Supplementary data are available at Bioinformatics online.

[1]  L. T. Rangel,et al.  The Archean origin of oxygenic photosynthesis and extant cyanobacterial lineages , 2021, Proceedings of the Royal Society B.

[2]  Gergely J. Szöllősi,et al.  Relative Time Constraints Improve Molecular Dating , 2020, bioRxiv.

[3]  Siavash Mirarab,et al.  Log Transformation Improves Dating of Phylogenies , 2019, bioRxiv.

[4]  Soumya Kundu,et al.  SaGePhy: an improved phylogenetic simulation framework for gene and subgene evolution , 2019, Bioinform..

[5]  Johannes L. Schönberger,et al.  SciPy 1.0: fundamental algorithms for scientific computing in Python , 2019, Nature Methods.

[6]  D. Gruen,et al.  Paleozoic diversification of terrestrial chitin-degrading bacterial lineages , 2019, BMC Evolutionary Biology.

[7]  G. Fournier,et al.  Horizontal gene transfer constrains the timing of methanogen evolution , 2018, Nature Ecology & Evolution.

[8]  Bastien Boussau,et al.  Gene transfers can date the Tree of Life , 2018, Nature Ecology & Evolution.

[9]  G. Fournier,et al.  Dating phototrophic microbial lineages with reticulate gene histories , 2018, Geobiology.

[10]  Erik M. Volz,et al.  Scalable relaxed clock phylogenetic dating , 2017 .

[11]  Mario dos Reis,et al.  Bayesian molecular clock dating of species divergences in the genomics era , 2015, Nature Reviews Genetics.

[12]  Olivier Gascuel,et al.  Fast Dating Using Least-Squares Criteria and Algorithms , 2015, Systematic biology.

[13]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[14]  Koichiro Tamura,et al.  Estimating divergence times in large molecular phylogenies , 2012, Proceedings of the National Academy of Sciences.

[15]  Michael T. Hallett,et al.  Simultaneous Identification of Duplications and Lateral Gene Transfers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  R. Ricklefs,et al.  Rooting and dating maples (Acer) with an uncorrelated-rates molecular clock: implications for north American/Asian disjunctions. , 2008, Systematic biology.

[17]  D. Bryant,et al.  A general comparison of relaxed molecular clock models. , 2007, Molecular biology and evolution.

[18]  Ziheng Yang,et al.  Inferring speciation times under an episodic molecular clock. , 2007, Systematic biology.

[19]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[20]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[21]  Z. Yang,et al.  Estimation of primate speciation dates using local molecular clocks. , 2000, Molecular biology and evolution.

[22]  J. Gogarten,et al.  Horizontal gene transfer: pitfalls and promises. , 1999 .

[23]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[24]  A. Rambaut,et al.  Estimating divergence dates from molecular sequences. , 1998, Molecular biology and evolution.

[25]  Ziheng Yang,et al.  Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. , 2006, Molecular biology and evolution.

[26]  Charles H. Langley,et al.  An examination of the constancy of the rate of molecular evolution , 2005, Journal of Molecular Evolution.

[27]  M. Sanderson Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. , 2002, Molecular biology and evolution.