An evolutionary model motivated by physicochemical properties of amino acids reveals variation among proteins

Motivation The relative rates of amino acid interchanges over evolutionary time are likely to vary among proteins. Variation in those rates has the potential to reveal information about constraints on proteins. However, the most straightforward model that could be used to estimate relative rates of amino acid substitution is parameter‐rich and it is therefore impractical to use for this purpose. Results A six‐parameter model of amino acid substitution that incorporates information about the physicochemical properties of amino acids was developed. It showed that amino acid side chain volume, polarity and aromaticity have major impacts on protein evolution. It also revealed variation among proteins in the relative importance of those properties. The same general approach can be used to improve the fit of empirical models such as the commonly used PAM and LG models. Availability and implementation Perl code and test data are available from https://github.com/ebraun68/sixparam.

[1]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[2]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[3]  Thomas J Naughton,et al.  Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified , 2006, BMC Evolutionary Biology.

[4]  J. Shaw,et al.  The Two AGPase Subunits Evolve at Different Rates in Angiosperms, yet They Are Equally Sensitive to Activity-Altering Amino Acid Changes When Expressed in Bacteria[W] , 2007, The Plant Cell Online.

[5]  P. Polak,et al.  Transcription induces strand-specific mutations at the 5' end of human genes. , 2008, Genome research.

[6]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[7]  M. Martindale,et al.  Assessing the root of bilaterian animals with scalable phylogenomic methods , 2009, Proceedings of the Royal Society B: Biological Sciences.

[8]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[9]  T. Warnow,et al.  Phylogenomic analyses data of the avian phylogenomics project , 2015, GigaScience.

[10]  Martin Vingron,et al.  Modeling Amino Acid Replacement , 2000, J. Comput. Biol..

[11]  David C. Nickle,et al.  HIV-Specific Probabilistic Models of Protein Evolution , 2007, PloS one.

[12]  Richard A. Goldstein,et al.  rtREV: An Amino Acid Substitution Matrix for Inference of Retrovirus and Reverse Transcriptase Phylogeny , 2002, Journal of Molecular Evolution.

[13]  Jianzhi Zhang,et al.  Rates of Conservative and Radical Nonsynonymous Nucleotide Substitutions in Mammalian Nuclear Genes , 2000, Journal of Molecular Evolution.

[14]  A. Hughes,et al.  More radical amino acid replacements in primates than in rodents: support for the evolutionary role of effective population size. , 2009, Gene.

[15]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[16]  Arlin Stoltzfus,et al.  Untangling the Effects of Codon Mutation and Amino Acid Exchangeability , 2004, Pacific Symposium on Biocomputing.

[17]  Tomoko Ohta,et al.  Weak Selection and Protein Evolution , 2012, Genetics.

[18]  P. Green,et al.  Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  S. Carroll,et al.  More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. , 2005, Molecular biology and evolution.

[20]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[21]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[22]  Claus O. Wilke,et al.  Causes of evolutionary rate variation among protein sites , 2016, Nature Reviews Genetics.

[23]  O. Gascuel,et al.  An improved general amino acid replacement matrix. , 2008, Molecular biology and evolution.

[24]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[25]  Bernard M. E. Moret,et al.  Phylogenetic Inference , 2011, Encyclopedia of Parallel Computing.

[26]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[27]  Xuhua Xia,et al.  What Amino Acid Properties Affect Protein Evolution? , 1998, Journal of Molecular Evolution.

[28]  H. Kishino,et al.  Maximum likelihood inference of protein phylogeny and the origin of chloroplasts , 1990, Journal of Molecular Evolution.

[29]  Philipp W. Messer,et al.  Strong Purifying Selection at Synonymous Sites in D. melanogaster , 2013, PLoS genetics.

[30]  Z. Yang,et al.  Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. , 1998, Molecular biology and evolution.

[31]  R. Nielsen,et al.  Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. , 2002, Molecular biology and evolution.

[32]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[33]  Alexandros Stamatakis,et al.  Does the choice of nucleotide substitution models matter topologically? , 2016, BMC Bioinformatics.

[34]  M. Braun,et al.  Why Do Phylogenomic Data Sets Yield Conflicting Trees? Data Type Influences the Avian Tree of Life more than Taxon Sampling , 2017, Systematic biology.

[35]  F. Myouga,et al.  Increased Expression and Protein Divergence in Duplicate Genes Is Associated with Morphological Diversification , 2009, PLoS genetics.

[36]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[37]  Andrew D. Smith,et al.  A Transition Probability Model for Amino Acid Substitutions from Blocks , 2003, J. Comput. Biol..

[38]  Peng Zhang,et al.  Selecting Question-Specific Genes to Reduce Incongruence in Phylogenomics: A Case Study of Jawed Vertebrate Backbone Phylogeny. , 2015, Systematic biology.

[39]  N. Smith Are Radical and Conservative Substitution Rates Useful Statistics in Molecular Evolution? , 2003, Journal of Molecular Evolution.

[40]  L. Hurst,et al.  Hearing silence: non-neutral evolution at synonymous sites in mammals , 2006, Nature Reviews Genetics.

[41]  A. G. Pedersen,et al.  Computational Molecular Evolution , 2013 .

[42]  Tandy Warnow,et al.  Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation , 2017 .

[43]  Paulino Pérez-Rodríguez,et al.  Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes , 2012, BMC Research Notes.

[44]  Md. Shamsuzzoha Bayzid,et al.  Whole-genome analyses resolve early branches in the tree of life of modern birds , 2014, Science.

[45]  H. Ellegren,et al.  Kr/Kc but not dN/dS correlates positively with body mass in birds, raising implications for inferring lineage-specific selection , 2014, Genome Biology.

[46]  E. Koonin,et al.  Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. , 2003, Genome research.

[47]  Andreas R. Pfenning,et al.  Comparative genomics reveals insights into avian genome evolution and adaptation , 2014, Science.

[48]  Wen-Hsiung Li,et al.  The nonsynonymous/synonymous substitution rate ratio versus the radical/conservative replacement rate ratio in the evolution of mammalian genes. , 2007, Molecular biology and evolution.

[49]  O. Gascuel,et al.  Phylogenetic mixture models for proteins , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[50]  John Gatesy,et al.  On the importance of homology in the age of phylogenomics , 2018 .