Relative evolutionary rate inference in HyPhy with LEISR

We introduce LEISR (Likehood Estimation of Individual Site Rates, pronounced “laser”), a tool to infer relative evolutionary rates from protein and nucleotide data, implemented in HyPhy. LEISR is based on the popular Rate4Site (Pupko et al., 2002) approach for inferring relative site-wise evolutionary rates, primarily from protein data. We extend the original method for more general use in several key ways: (i) we increase the support for nucleotide data with additional models, (ii) we allow for datasets of arbitrary size, (iii) we support analysis of site-partitioned datasets to correct for the presence of recombination breakpoints, (iv) we produce rate estimates at all sites rather than at just a subset of sites, and (v) we implemented LEISR as MPI-enabled to support rapid, high-throughput analysis. LEISR is available in HyPhy starting with version 2.3.8, and it is accessible as an option in the HyPhy analysis menu (“Relative evolutionary rate inference”), which calls the HyPhy batchfile LEISR.bf.

[1]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[2]  Sergei L. Kosakovsky Pond,et al.  FUBAR: a fast, unconstrained bayesian approximation for inferring selection. , 2013, Molecular biology and evolution.

[3]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[4]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[5]  S. Muse,et al.  Site-to-site variation of synonymous substitution rates. , 2005, Molecular biology and evolution.

[6]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[7]  Benjamin R. Jack,et al.  Measuring evolutionary rates of proteins in a structural context , 2017, F1000Research.

[8]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[9]  Simon Tavaré,et al.  Lines-of-descent and genealogical processes, and their applications in population genetics models , 1984, Advances in Applied Probability.

[10]  Meggan E Craft,et al.  “One Health” or Three? Publication Silos Among the One Health Disciplines , 2016, PLoS biology.

[11]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[12]  Stephanie J. Spielman,et al.  Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies , 2015, bioRxiv.

[13]  S. Tavaré,et al.  Line-of-descent and genealogical processes, and their applications in population genetics models. , 1984, Theoretical population biology.

[14]  Sergei L. Kosakovsky Pond,et al.  Not so different after all: a comparison of methods for detecting amino acid sites under selection. , 2005, Molecular biology and evolution.

[15]  C. Cox,et al.  A 20-state empirical amino-acid substitution model for green plant chloroplasts. , 2013, Molecular phylogenetics and evolution.

[16]  Sergei L. Kosakovsky Pond,et al.  On the Validity of Evolutionary Models with Site-Specific Parameters , 2014, PloS one.

[17]  C. C. Dang,et al.  Improved mitochondrial amino acid substitution models for metazoan evolutionary studies , 2017, BMC Evolutionary Biology.

[18]  David C. Nickle,et al.  HIV-Specific Probabilistic Models of Protein Evolution , 2007, PloS one.

[19]  O. Gascuel,et al.  An improved general amino acid replacement matrix. , 2008, Molecular biology and evolution.

[20]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[21]  David Posada,et al.  Automated phylogenetic detection of recombination using a genetic algorithm. , 2006, Molecular biology and evolution.

[22]  Claus O. Wilke,et al.  Causes of evolutionary rate variation among protein sites , 2016, Nature Reviews Genetics.

[23]  Benjamin R. Jack,et al.  Functional Sites Induce Long-Range Evolutionary Constraints in Enzymes , 2016, PLoS biology.

[24]  Stephanie J. Spielman,et al.  A Comparison of One-Rate and Two-Rate Inference Frameworks for Site-Specific dN/dS Estimation , 2015, Genetics.

[25]  S. Jeffery Evolution of Protein Molecules , 1979 .

[26]  Stephanie J. Spielman,et al.  Membrane Environment Imposes Unique Selection Pressures on Transmembrane Domains of G Protein-Coupled Receptors , 2012, Journal of Molecular Evolution.