The impact of ancestral population size and incomplete lineage sorting on Bayesian estimation of species divergence times

Although the effects of the coalescent process on sequence divergence and genealogies are well understood, the vir- tual majority of studies that use molecular sequences to estimate times of divergence among species have failed to account for the coalescent process. Here we study the impact of ancestral population size and incomplete lineage sorting on Bayesian estimates of species divergence times under the molecular clock when the inference model ignores the coalescent process. Using a combi- nation of mathematical analysis, computer simulations and analysis of real data, we find that the errors on estimates of times and the molecular rate can be substantial when ancestral populations are large and when there is substantial incomplete lineage sort- ing. For example, in a simple three-species case, we find that if the most precise fossil calibration is placed on the root of the phylogeny, the age of the internal node is overestimated, while if the most precise calibration is placed on the internal node, then the age of the root is underestimated. In both cases, the molecular rate is overestimated. Using simulations on a phylogeny of nine species, we show that substantial errors in time and rate estimates can be obtained even when dating ancient divergence events. We analyse the hominoid phylogeny and show that estimates of the neutral mutation rate obtained while ignoring the coalescent are too high. Using a coalescent-based technique to obtain geological times of divergence, we obtain estimates of the mutation rate that are within experimental estimates and we also obtain substantially older divergence times within the phylogeny (Current Zoology 61 (5): 874-885, 2015). Keywords Ancestral polymorphism, Incomplete lineage sorting, Divergence time estimation, Gene tree, Species tree

[1]  J. Oliver MICROEVOLUTIONARY PROCESSES GENERATE PHYLOGENOMIC DISCORDANCE AT ANCIENT DIVERGENCES , 2013, Evolution; international journal of organic evolution.

[2]  Masami Hasegawa,et al.  Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny , 2012, Proceedings of the Royal Society B: Biological Sciences.

[3]  Charles H. Langley,et al.  Are evolutionary rates really variable? , 1979, Journal of Molecular Evolution.

[4]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[5]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[6]  B. Rannala,et al.  Bayesian species delimitation using multilocus sequence data , 2010, Proceedings of the National Academy of Sciences.

[7]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[8]  David A. Morrison,et al.  Estimating Species Trees: Practical and Theoretical Aspects , 2011 .

[9]  R. Durbin,et al.  Revising the human mutation rate: implications for understanding human evolution , 2012, Nature Reviews Genetics.

[10]  D. Erwin,et al.  The Cambrian Conundrum: Early Divergence and Later Ecological Success in the Early History of Animals , 2011, Science.

[11]  Md. Shamsuzzoha Bayzid,et al.  Whole-genome analyses resolve early branches in the tree of life of modern birds , 2014, Science.

[12]  Nathan M. Young,et al.  Primate molecular divergence dates. , 2006, Molecular phylogenetics and evolution.

[13]  Thomas K. F. Wong,et al.  Phylogenomics resolves the timing and pattern of insect evolution , 2014, Science.

[14]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[15]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[16]  Ziheng Yang,et al.  Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors. , 2008, Molecular biology and evolution.

[17]  J. Klein,et al.  Divergence time and population size in the lineage leading to modern humans. , 1995, Theoretical population biology.

[18]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[19]  Laura Kubatko,et al.  Estimating species trees : practical and theoretical aspects , 2010 .

[20]  C. J-F,et al.  THE COALESCENT , 1980 .

[21]  Ziheng Yang,et al.  The unbearable uncertainty of Bayesian divergence time estimation , 2012 .

[22]  Ziheng Yang,et al.  Molecular Evolution: A Statistical Approach , 2014 .

[23]  M Chévremont,et al.  Mitochondrial DNA , 2009, Encyclopedia of Biometrics.

[24]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[25]  Kevin E. Langergraber,et al.  Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution , 2012, Proceedings of the National Academy of Sciences.

[26]  Hong Ma,et al.  Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early divergence times , 2014, Nature Communications.

[27]  S. Edwards,et al.  GENE DIVERGENCE , POPULATION DIVERGENCE , AND THE VARIANCE IN COALESCENCE TIME IN PHYLOGEOGRAPHIC STUDIES , 2001 .

[28]  Sudhir Kumar,et al.  The timetree of life , 2009 .

[29]  Ziheng Yang,et al.  Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. , 2006, Molecular biology and evolution.

[30]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[31]  Loren H. Rieseberg,et al.  Gene trees and species trees are not the same , 2001 .

[32]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[33]  R. A. Pyron,et al.  THE IMPACT OF GENE‐TREE/SPECIES‐TREE DISCORDANCE ON DIVERSIFICATION‐RATE ESTIMATION , 2011, Evolution; international journal of organic evolution.

[34]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[35]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[36]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[37]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.