Pseudo Dollo models for the evolution of binary characters along a tree

The stochastic Dollo model is a model for capturing evolution of features, for example cognate data in language evolution. However, it is rather sensitive to borrowing events, coding errors, semantic shift and other anomalies, so other models, in particular the covarion model, tends to have a better fit to the data. Here, we introduce the pseudo Dollo model, a model of character evolution along a tree that can be formulated as a three-state continuous time Markov chain (CTMC) model. The initial state represent absence of a feature, then a birth event allows the feature to be present. A death event can follow so that the feature becomes absent again. However, no new birth events are allowed after a death event has taken place. We examine the model in a fully Bayesian setting, and demonstrate it can have a better fit than some of the popular alternative models on some real world datasets. Some variations on the pseudo Dollo model are introduced as well, including the multi-state pseudo Dollo model and pseudo Dollo covarion model. The model is implemented in open source software Babel, a package to BEAST [2] licensed under LGPL. A user friendly way to set up an analysis is available through BEAUti, the graphical user interface of BEAST.

[1]  Tanja Stadler,et al.  Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration , 2014, PLoS Comput. Biol..

[2]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[3]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[4]  R. Bouckaert,et al.  Model Selection and Parameter Inference in Phylogenetics Using Nested Sampling , 2017, Systematic biology.

[5]  Chundra Cathcart,et al.  Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis , 2015 .

[6]  Christopher J. Lee,et al.  Wagner and Dollo: a stochastic duet by composing two parsimonious solos. , 2008, Systematic biology.

[7]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[8]  Remco R. Bouckaert,et al.  Bayesian Evolutionary Analysis with BEAST , 2015 .

[9]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[10]  Andrew Garrett,et al.  Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis: Supplementary materials , 2015 .

[11]  Simon J. Greenhill,et al.  Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement , 2009, Science.

[12]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[13]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[14]  Geoff K. Nicholls,et al.  Dated ancestral trees from binary trait data and their application to the diversification of languages , 2007, 0711.1874.

[15]  M. Steel,et al.  Modeling the covarion hypothesis of nucleotide substitution. , 1998, Mathematical biosciences.

[16]  Louis Dollo,et al.  Les lois de l'évolution , 1893 .

[17]  R. Bouckaert,et al.  Bayesian phylolinguistics reveals the internal structure of the Transeurasian family , 2018, Journal of Language Evolution.

[18]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.