Amino acid compositions contribute to the proteins’ evolution under the influence of their abundances and genomic GC content

Inconsistent results on the association between evolutionary rates and amino acid composition of proteins have been reported in eukaryotes. However, there are few studies of how amino acid composition can influence evolutionary rates in bacteria. Thus, we constructed linear regression models between composition frequencies of amino acids and evolutionary rates for bacteria. Compositions of all amino acids can on average explain 21.5% of the variation in evolutionary rates among 273 investigated bacterial organisms. In five model organisms, amino acid composition contributes more to variation in evolutionary rates than protein abundance, and frequency of optimal codons. The contribution of individual amino acid composition to evolutionary rate varies among organisms. The closer the GC-content of genome to its maximum or minimum, the better the correlation between the amino acid content and the evolutionary rate of proteins would appear in that genome. The types of amino acids that significantly contribute to evolutionary rates can be grouped into GC-rich and AT-rich amino acids. Besides, the amino acid with high composition also contributes more to evolutionary rates than amino acid with low composition in proteome. In summary, amino acid composition significantly contributes to the rate of evolution in bacterial organisms and this in turn is impacted by GC-content.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  D. Harlan,et al.  The human myristoylated alanine-rich C kinase substrate (MARCKS) gene (MACS). Analysis of its gene product, promoter, and chromosomal localization. , 1991, The Journal of biological chemistry.

[3]  Paul M. Sharp,et al.  Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases , 1994, Nucleic Acids Res..

[4]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[5]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[6]  L. Duret,et al.  Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[8]  W. Li,et al.  Selective constraints, amino acid composition, and the rate of protein evolution. , 2000, Molecular biology and evolution.

[9]  Takashi Gojobori,et al.  Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  L. Hurst The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[11]  Hervé Seligmann,et al.  Cost-Minimization of Amino Acid Usage , 2003, Journal of Molecular Evolution.

[12]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[13]  E. Koonin,et al.  A universal trend of amino acid gain and loss in protein evolution , 2005, Nature.

[14]  D. Graur Amino acid composition and the evolutionary rates of protein-coding genes , 2005, Journal of Molecular Evolution.

[15]  C. Dutta,et al.  Codon and amino acid usage in two major human pathogens of genus Bartonella--optimization between replicational-transcriptional selection, translational control and cost minimization. , 2005, DNA research : an international journal for rapid publication of reports on genes and genomes.

[16]  C. Wilke,et al.  Why highly expressed proteins evolve slowly. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[18]  W. Wong,et al.  Bayes empirical bayes inference of amino acid sites under positive selection. , 2005, Molecular biology and evolution.

[19]  C. Pál,et al.  An integrated view of protein evolution , 2006, Nature Reviews Genetics.

[20]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[21]  Ian Holmes,et al.  An empirical codon model for protein sequence evolution. , 2007, Molecular biology and evolution.

[22]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[23]  Shaila C. Rössle,et al.  LRRML: a conformational database and an XML description of leucine-rich repeats (LRRs) , 2008, BMC Structural Biology.

[24]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[25]  D. Chinkes,et al.  Leucine-enriched essential amino acid and carbohydrate ingestion following resistance exercise enhances mTOR signaling and protein synthesis in human muscle. , 2008, American journal of physiology. Endocrinology and metabolism.

[26]  Mark Gerstein,et al.  Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate , 2009, PLoS Comput. Biol..

[27]  M. Iorio,et al.  A semi-automatic method to guide the choice of ridge parameter in ridge regression , 2012, 1205.0686.

[28]  Jian-Rong Yang,et al.  Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly , 2013, Proceedings of the National Academy of Sciences.

[29]  Ola Brynildsrud,et al.  Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes , 2013, PloS one.

[30]  William R. Pearson,et al.  Adjusting scoring matrices to correct overextended alignments , 2013, Bioinform..

[31]  Eugene V. Koonin,et al.  Coupling Between Protein Level Selection and Codon Usage Optimization in the Evolution of Bacteria and Archaea , 2014, mBio.

[32]  U. Grossniklaus,et al.  Hybridization Alters Spontaneous Mutation Rates in a Parent-of-Origin-Dependent Fashion in Arabidopsis1[W] , 2014, Plant Physiology.

[33]  I. Kaj,et al.  Why Time Matters: Codon Evolution and the Temporal Dynamics of dN/dS , 2013, Molecular biology and evolution.

[34]  F. Guo,et al.  Analysis of the Relationship between Genomic GC Content and Patterns of Base Usage, Codon Usage and Amino Acid Usage in Prokaryotes: Similar GC Content Adopts Similar Compositional Frequencies Regardless of the Phylogenetic Lineages , 2014, PloS one.

[35]  Sihai Yang,et al.  Relationship between amino acid usage and amino acid evolution in primates. , 2015, Gene.

[36]  Damian Szklarczyk,et al.  Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell‐lines , 2015, Proteomics.

[37]  Laurent Duret,et al.  GC-Content Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands , 2014, bioRxiv.

[38]  Jian-Rong Yang,et al.  Determinants of the rate of protein sequence evolution , 2015, Nature Reviews Genetics.

[39]  Michael Lynch,et al.  Genetic drift, selection and the evolution of the mutation rate , 2016, Nature Reviews Genetics.

[40]  P. Bork,et al.  Energy efficiency trade-offs drive nucleotide usage in transcribed regions , 2016, Nature Communications.

[41]  W. Bao,et al.  New insights into the codon usage patterns of the bactericidal/permeability-increasing (BPI) gene across nine species. , 2017, Gene.

[42]  V. V. Khrustalev,et al.  Mutational Pressure in Zika Virus: Local ADAR-Editing Areas Associated with Pauses in Translation and Replication , 2017, Front. Cell. Infect. Microbiol..

[43]  D. Lefer,et al.  3-Mercaptopyruvate sulfurtransferase produces potential redox regulators cysteine- and glutathione-persulfide (Cys-SSH and GSSH) together with signaling molecules H2S2, H2S3 and H2S , 2017, Scientific Reports.

[44]  R. Baraúna,et al.  Genomic Architecture of the Two Cold-Adapted Genera Exiguobacterium and Psychrobacter: Evidence of Functional Reduction in the Exiguobacterium antarcticum B7 Genome , 2018, Genome biology and evolution.