The Geometry of the space of Discrete Coalescent Trees

Computational inference of dated evolutionary histories relies upon various hypotheses about RNA, DNA, and protein sequence mutation rates. Using mutation rates to infer these dated histories is referred to as molecular clock assumption. Coalescent theory is a popular class of evolutionary models that implements the molecular clock hypothesis to facilitate computational inference of dated phylogenies. Cancer and virus evolution are two areas where these methods are particularly important. Methodologically, phylogenetic inference methods require a tree space over which the inference is performed, and geometry of this space plays an important role in statistical and computational aspects of tree inference algorithms. It has recently been shown that molecular clock, and hence coalescent, trees possess a unique geometry, different from that of classical phylogenetic tree spaces which do not model mutation rates. Here we introduce and study a space of discrete coalescent trees, that is, we assume that time is discrete, which is inevitable in many computational formalisations. We establish several geometrical properties of the space and show how these properties impact various algorithms used in phylogenetic analyses. Our tree space is a discretisation of a known time tree space, called t-space, and hence our results can be used to approximate solutions to various open problems in t-space. Our tree space is also a generalisation of another known trees space, called the ranked nearest neighbour interchange space, hence our advances in this paper imply new and generalise existing results about ranked trees.

[1]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[2]  C. J-F,et al.  THE COALESCENT , 1980 .

[3]  Frederick Albert Matsen IV,et al.  The combinatorics of discrete time-trees: theory and open problems , 2016, bioRxiv.

[4]  Timothy M. Chan,et al.  Counting inversions, offline orthogonal range counting, and related problems , 2010, SODA '10.

[5]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[6]  Ezra Miller,et al.  Polyhedral computational geometry for averaging metric phylogenetic trees , 2012, Adv. Appl. Math..

[7]  H. Ohtsuki,et al.  Forward and backward evolutionary processes and allele frequency spectrum in a cancer cell population. , 2017, Theoretical population biology.

[8]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[9]  Sudhir Kumar,et al.  Advances in Time Estimation Methods for Molecular Data. , 2016, Molecular biology and evolution.

[10]  Alexei J Drummond,et al.  The space of ultrametric phylogenetic trees. , 2014, Journal of theoretical biology.

[11]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[12]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[13]  T. Turner Phylogenetics , 2018, The International Encyclopedia of Biological Anthropology.

[14]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[15]  Ming Li,et al.  Some Notes on the Nearest Neighbour Interchange Distance , 1996, COCOON.

[16]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[17]  Mary K Kuhner,et al.  Coalescent genealogy samplers: windows into population history. , 2009, Trends in ecology & evolution.

[18]  Xin He,et al.  On computing the nearest neighbor interchange distance , 1999, Discrete Mathematical Problems with Medical Applications.

[19]  Frederick Albert Matsen IV,et al.  Polyhedral Geometry of Phylogenetic Rogue Taxa , 2010, Bulletin of mathematical biology.

[20]  Alex Gavryushkin,et al.  Computing nearest neighbour interchange distances between ranked phylogenetic trees , 2020, Journal of Mathematical Biology.

[21]  Louis J. Billera,et al.  Geometry of the Space of Phylogenetic Trees , 2001, Adv. Appl. Math..

[22]  D. Posada CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples , 2020, Molecular biology and evolution.

[23]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[24]  Jon A Yamato,et al.  Maximum likelihood estimation of population growth rates based on the coalescent. , 1998, Genetics.

[25]  Alexey M. Kozlov,et al.  RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference , 2018, bioRxiv.

[26]  Ryo Yoshinaka,et al.  The Time Complexity of the Token Swapping Problem and Its Parallel Variants , 2017, WALCOM.