Efficient Bayesian inference of general Gaussian models on large phylogenetic trees

Phylogenetic comparative methods correct for shared evolutionary history among a set of non-independent organisms by modeling sample traits as arising from a diffusion process along on the branches of a possibly unknown history. To incorporate such uncertainty, we present a scalable Bayesian inference framework under a general Gaussian trait evolution model that exploits Hamiltonian Monte Carlo (HMC). HMC enables efficient sampling of the constrained model parameters and takes advantage of the tree structure for fast likelihood and gradient computations, yielding algorithmic complexity linear in the number of observations. This approach encompasses a wide family of stochastic processes, including the general Ornstein-Uhlenbeck (OU) process, with possible missing data and measurement errors. We implement inference tools for a biologically relevant subset of all these models into the BEAST phylogenetic software package and develop model comparison through marginal likelihood estimation. We apply our approach to study the morphological evolution in the superfamilly of Musteloidea (including weasels and allies) as well as the heritability of HIV virulence. This second problem furnishes a new measure of evolutionary heritability that demonstrates its utility through a targeted simulation study.

[1]  Ran Raz,et al.  On the complexity of matrix product , 2002, STOC '02.

[2]  R. Kerr A cautionary note. , 1988, Science.

[3]  Richard G FitzJohn,et al.  Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. , 2009, Systematic biology.

[4]  Tanja Stadler,et al.  A Practical Guide to Estimating the Heritability of Pathogen Traits , 2018, Molecular biology and evolution.

[5]  Jussi T. Eronen,et al.  Fossils matter - understanding modes and rates of trait evolution in Musteloidea (Carnivora) , 2017 .

[6]  R. FitzJohn Diversitree: comparative phylogenetic analyses of diversification in R , 2012 .

[7]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[8]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[9]  Vladimir N Minin,et al.  Marginal Likelihoods in Phylogenetics: A Review of Methods and Applications , 2018, Systematic biology.

[10]  Geoffrey J. McLachlan,et al.  logKDE: log-transformed kernel density estimation , 2018, J. Open Source Softw..

[11]  Paul Bastide Shifted stochastic processes evolving on trees : application to models of adaptive evolution on phylogenies. , 2017 .

[12]  Forrest W. Crawford,et al.  Unifying the spatial epidemiology and molecular evolution of emerging epidemics , 2012, Proceedings of the National Academy of Sciences.

[13]  Elena Conti,et al.  A General Model for Estimating Macroevolutionary Landscapes , 2018, Systematic biology.

[14]  J. Hadfield,et al.  General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi‐trait models for continuous and categorical characters , 2010, Journal of evolutionary biology.

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  L. Orlando,et al.  Evolutionary Patterns and Processes: Lessons from Ancient DNA , 2016, Systematic biology.

[17]  J. Magnus,et al.  Symmetry, 0-1 Matrices and Jacobians: A Review , 1986, Econometric Theory.

[18]  Chris J Law,et al.  Lineage Diversity and Size Disparity in Musteloidea: Testing Patterns of Adaptive Radiation Using Molecular and Fossil‐Based Methods , 2018, Systematic biology.

[19]  Trevor Bedford,et al.  Simultaneously estimating evolutionary history and repeated traits phylogenetic signal: applications to viral and host phenotypic evolution , 2015, Methods in ecology and evolution.

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  J. Fellay,et al.  Dissecting HIV Virulence: Heritability of Setpoint Viral Load, CD4+ T-Cell Decline, and Per-Parasite Pathogenicity , 2017, bioRxiv.

[22]  J. Hadfield,et al.  The Contribution of Viral Genotype to Plasma Viral Set-Point in HIV Infection , 2014, PLoS pathogens.

[23]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[24]  Hélène Morlon,et al.  A Penalized Likelihood Framework for High‐Dimensional Phylogenetic Comparative Methods and an Application to New‐World Monkeys Brain Evolution , 2018, Systematic biology.

[25]  Pablo Duchen,et al.  On the Effect of Asymmetrical Trait Inheritance on Models of Trait Evolution , 2020, Systematic biology.

[26]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[27]  M. Lynch METHODS FOR THE ANALYSIS OF COMPARATIVE DATA IN EVOLUTIONARY BIOLOGY , 1991, Evolution; international journal of organic evolution.

[28]  M. Lynch,et al.  The Phylogenetic Mixed Model , 2004, The American Naturalist.

[29]  Richard G FitzJohn,et al.  Model Adequacy and the Macroevolution of Angiosperm Functional Traits , 2014, bioRxiv.

[30]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[31]  Paul Bastide,et al.  DETECTION OF ADAPTIVE SHIFTS ON PHYLOGENIES USING SHIFTED STOCHASTIC PROCESSES ON A TREE , 2015, bioRxiv.

[32]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[33]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[34]  Hélène Morlon,et al.  Understanding the effect of competition during evolutionary radiations: an integrated model of phenotypic and species diversification. , 2019, Ecology letters.

[35]  J. Fellay,et al.  Viral genetic variation accounts for a third of variability in HIV-1 set-point viral load in Europe , 2017, PLoS biology.

[36]  Richard G FitzJohn,et al.  Quantitative traits and diversification. , 2010, Systematic biology.

[37]  Peter E Midford,et al.  Estimating a binary character's effect on speciation and extinction. , 2007, Systematic biology.

[38]  Lesley T Lancaster,et al.  Phylogenetic inference of reciprocal effects between geographic range evolution and diversification. , 2011, Systematic biology.

[39]  Matthew W. Pennell,et al.  An integrative view of phylogenetic comparative methods: connections to population genetics, community ecology, and paleobiology , 2013, Annals of the New York Academy of Sciences.

[40]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[41]  Eric W Goolsby,et al.  Likelihood-Based Parameter Estimation for High-Dimensional Phylogenetic Comparative Models: Overcoming the Limitations of "Distance-Based" Methods. , 2016, Systematic biology.

[42]  Simon Whelan,et al.  Estimating Phylogenies from Shape and Similar Multidimensional Data: Why It Is Not Reliable. , 2020, Systematic biology.

[43]  D. Silvestro,et al.  Bridging Inter- and Intraspecific Trait Evolution with a Hierarchical Bayesian Approach. , 2016, Systematic biology.

[44]  Krzysztof Bartoszek,et al.  Trait Evolution with Jumps: Illusionary Normality , 2017, bioRxiv.

[45]  T. F. Hansen,et al.  TRANSLATING BETWEEN MICROEVOLUTIONARY PROCESS AND MACROEVOLUTIONARY PATTERNS: THE CORRELATION STRUCTURE OF INTERSPECIFIC DATA , 1996, Evolution; international journal of organic evolution.

[46]  J. Wiens,et al.  Paleontology, genomics, and combined-data phylogenetics: can molecular data improve phylogeny estimation for fossil taxa? , 2009, Systematic biology.

[47]  Xiang Ji,et al.  Gradients do grow on trees: a linear-time 𝒪 (N)-dimensional gradient for statistical phylogenetics. , 2020, Molecular biology and evolution.

[48]  Sebastian Bonhoeffer,et al.  Potential Pitfalls in Estimating Viral Load Heritability. , 2016, Trends in microbiology.

[49]  Dorota Kurowicka,et al.  Generating random correlation matrices based on vines and extended onion method , 2009, J. Multivar. Anal..

[50]  H. Akaike A new look at the statistical model identification , 1974 .

[51]  Scott L Nuismer,et al.  Predicting rates of interspecific interaction from phylogenetic trees. , 2015, Ecology letters.

[52]  A. Telenti,et al.  Phylogenetic Approach Reveals That Virus Genotype Largely Determines HIV Set-Point Viral Load , 2010, PLoS pathogens.

[53]  M. Suchard,et al.  Phylogeography takes a relaxed random walk in continuous space and time. , 2010, Molecular biology and evolution.

[54]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[55]  C. Parins-Fukuchi Bayesian placement of fossils on phylogenies using quantitative morphometric data , 2018, Evolution; international journal of organic evolution.

[56]  Seraina Klopfstein,et al.  Illustrating phylogenetic placement of fossils using RoguePlots: An example from ichneumonid parasitoid wasps (Hymenoptera, Ichneumonidae) and an extensive morphological matrix , 2018, bioRxiv.

[57]  Yi Guan,et al.  treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. , 2019, Molecular biology and evolution.

[58]  Mandev S. Gill,et al.  A Relaxed Directional Random Walk Model for Phylogenetic Trait Evolution. , 2016, Systematic biology.

[59]  Alexandros Stamatakis,et al.  Accuracy of morphology-based phylogenetic fossil placement under Maximum Likelihood , 2010, ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010.

[60]  Nicolas Lartillot,et al.  A phylogenetic Kalman filter for ancestral trait reconstruction using molecular data , 2014, Bioinform..

[61]  T. Garland,et al.  An assessment of phylogenetic tools for analyzing the interplay between interspecific interactions and phenotypic evolution , 2016, bioRxiv.

[62]  Frederick A Matsen,et al.  19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology. , 2018, Systematic biology.

[63]  Luis Alcalá,et al.  A non‐aquatic otter (Mammalia, Carnivora, Mustelidae) from the Late Miocene (Vallesian, MN 10) of La Roma 2 (Alfambra, Teruel, Spain): systematics and functional anatomy , 2013 .

[64]  Amaury Lambert,et al.  A Unifying Comparative Phylogenetic Framework Including Traits Coevolving Across Interacting Lineages. , 2016, Systematic biology.

[65]  Lam Si Tung Ho,et al.  Inferring Phenotypic Trait Evolution on Large Trees With Many Incomplete Measurements , 2019, Journal of the American Statistical Association.

[66]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[67]  L. Harmon,et al.  A novel Bayesian method for inferring and interpreting the dynamics of adaptive landscapes from phylogenetic comparative data , 2014, bioRxiv.

[68]  Seraina Klopfstein,et al.  A Total-Evidence Approach to Dating with Fossils, Applied to the Early Radiation of the Hymenoptera , 2012, Systematic biology.

[69]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[70]  Daniel Wegmann,et al.  Inference of Evolutionary Jumps in Large Phylogenies using Lévy Processes , 2016, bioRxiv.

[71]  Guy Baele,et al.  Emerging Concepts of Data Integration in Pathogen Phylodynamics , 2016, Systematic biology.

[72]  Emilie Lebarbier,et al.  Une introduction au critère BIC : fondements théoriques et interprétation , 2006 .

[73]  Yi Guan,et al.  Two Methods for Mapping and Visualizing Associated Data on Phylogeny Using Ggtree. , 2018, Molecular biology and evolution.

[74]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[75]  Max R. Tolkoff,et al.  Phylogenetic Factor Analysis. , 2017, Systematic biology.

[76]  T. F. Hansen,et al.  A phylogenetic comparative method for studying multivariate adaptation. , 2012, Journal of theoretical biology.

[77]  T. F. Hansen STABILIZING SELECTION AND THE COMPARATIVE ANALYSIS OF ADAPTATION , 1997, Evolution; international journal of organic evolution.

[78]  J. Felsenstein Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[79]  C. Parins-Fukuchi Use of Continuous Traits Can Improve Morphological Phylogenetics , 2017, bioRxiv.

[80]  Xiang Ji,et al.  Relaxed Random Walks at Scale. , 2019, Systematic biology.

[81]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[82]  H. Morlon,et al.  Estimating the Effect of Competition on Trait Evolution Using Maximum Likelihood Inference. , 2016, Systematic biology.

[83]  Michael J. Landis,et al.  Phylogenetic analysis using Lévy processes: finding jumps in the evolution of continuous traits. , 2013, Systematic biology.

[84]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[85]  Ingemar Kaj,et al.  Using the Ornstein-Uhlenbeck process to model the evolution of interacting populations. , 2017, Journal of theoretical biology.

[86]  Caitlin A. Kuczynski,et al.  Combining phylogenomics and fossils in higher-level squamate reptile phylogeny: molecular data change the placement of fossil taxa. , 2010, Systematic biology.

[87]  Tanja Stadler,et al.  Bayesian Total-Evidence Dating Reveals the Recent Crown Radiation of Penguins , 2015, Systematic biology.

[88]  Tanja Stadler,et al.  The Structured Coalescent and Its Approximations , 2016, bioRxiv.

[89]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[90]  Stéphane Robin,et al.  Inference of Adaptive Shifts for Multivariate Correlated Traits , 2017, bioRxiv.

[91]  Robert P. Freckleton,et al.  Fast likelihood calculations for comparative analyses , 2012 .

[92]  David K. Smith,et al.  ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data , 2017 .

[93]  Eric Moulines,et al.  Inference in Hidden Markov Models (Springer Series in Statistics) , 2005 .

[94]  Tanja Stadler,et al.  Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts. , 2019, Theoretical population biology.

[95]  H. Jeffreys Some Tests of Significance, Treated by the Theory of Probability , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[96]  R. Freckleton,et al.  A cautionary note on the use of Ornstein Uhlenbeck models in macroevolutionary studies , 2015, Biological journal of the Linnean Society. Linnean Society of London.

[97]  Awad H. Al-Mohy,et al.  The complex step approximation to the Fréchet derivative of a matrix function , 2009, Numerical Algorithms.

[98]  Anjali Goswami,et al.  Bayesian Estimation of Species Divergence Times Using Correlated Quantitative Characters , 2018, bioRxiv.

[99]  W. Jetz,et al.  The global diversity of birds in space and time , 2012, Nature.

[100]  Adi Ben-Israel Linear equations and inequalities on finite dimensional, real or complex, vector spaces: A unified theory☆ , 1969 .

[101]  G. Merceron,et al.  mvmorph: an r package for fitting multivariate evolutionary models to morphometric data , 2015 .

[102]  Trevor Bedford,et al.  ASSESSING PHENOTYPIC CORRELATION THROUGH THE MULTIVARIATE PHYLOGENETIC LATENT LIABILITY MODEL. , 2014, The annals of applied statistics.

[103]  Inaya Lima,et al.  Brain shape convergence in the adaptive radiation of New World monkeys , 2016, Proceedings of the National Academy of Sciences.

[104]  Paul J. McLaren,et al.  Estimating the Respective Contributions of Human and Viral Genetic Variation to HIV Control , 2015, bioRxiv.

[105]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[106]  M. Suchard,et al.  Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty. , 2016, Systematic biology.

[107]  J. Bruggeman,et al.  Rphylopars: fast multivariate phylogenetic comparative methods for missing data and within‐species variation , 2017 .

[108]  M. Pagel Inferring the historical patterns of biological evolution , 1999, Nature.

[109]  Tanja Stadler,et al.  Automatic generation of evolutionary hypotheses using mixed Gaussian phylogenetic models , 2019, Proceedings of the National Academy of Sciences.

[110]  Laurent Lehmann,et al.  Linking micro and macroevolution in the presence of migration. , 2019, Journal of theoretical biology.