Correlations between nucleotide frequencies and amino acid composition in 115 bacterial species.

We studied the correlations between amino acid composition and mononucleotide and dinucleotide frequencies in 115 bacterial genomes of varying G+C content. Observed amino acid frequencies were compared with those expected from the actual mononucleotide and dinucleotide frequencies. Both mononucleotide and dinucleotide frequencies correlate well with the amino acid frequency, with dinucleotide frequencies doing so better. Despite the strong correlations, some of the observed amino acid frequencies, in particular for Arg, Val, Asp, Glu, Ser, and Cys, were consistently different from predicted values in all genomes. We suggest that this variation from predicted values is a consequence of selection pressure at the level of amino acids, while the close correspondence to the predictions in residues such as Thr, Phe, Lys, and Asn arises only from mutation and selection pressure at the level of the nucleic acid sequences.

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[3]  G. Edelman,et al.  Degeneracy and complexity in biological systems , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  F. De Amicis,et al.  Intercodon dinucleotides affect codon choice in plant genes. , 2000, Nucleic acids research.

[5]  S. Karlin,et al.  Amino acid runs in eukaryotic proteomes and disease associations , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  David P. Kreil,et al.  Identification of thermophilic species by the amino acid compositions deduced from their genomes. , 2001, Nucleic acids research.

[7]  A. Wada,et al.  The effects of guanine and cytosine variation on dinucleotide frequency and amino acid composition in the human genome , 2005, Journal of Molecular Evolution.

[8]  L. Jermiin,et al.  Nucleotide Composition Bias Affects Amino Acid Content in Proteins Coded by Animal Mitochondria , 1997, Journal of Molecular Evolution.

[9]  G. Bernardi,et al.  Compositional constraints and genome evolution , 2005, Journal of Molecular Evolution.

[10]  S Karlin,et al.  Genome-scale compositional comparisons in eukaryotes. , 2001, Genome research.

[11]  Giorgio Bernardi,et al.  Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins , 1991, Journal of Molecular Evolution.

[12]  J. Lobry,et al.  Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. , 1997, Gene.

[13]  M. Van de Casteele,et al.  The role of the codon first letter in the relationship between genomic GC content and protein amino acid composition. , 1999, Research in microbiology.

[14]  Stephen J Freeland,et al.  A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes , 2001, Genome Biology.

[15]  T. Jukes,et al.  Relationship between G + C in silent sites of codons and amino acid composition of human proteins , 1993, Journal of Molecular Evolution.

[16]  T H Jukes,et al.  Amino acid composition of proteins: Selection against the genetic code. , 1975, Science.

[17]  G. Singer,et al.  Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. , 2000, Molecular biology and evolution.

[18]  A. Bird DNA methylation and the frequency of CpG in animal DNA. , 1980, Nucleic acids research.

[19]  G. Bernardi,et al.  Compositional constraints in the extremely GC-poor genome of Plasmodium falciparum. , 1997, Memorias do Instituto Oswaldo Cruz.

[20]  Adam Eyre-Walker,et al.  Mutation pressure, natural selection, and the evolution of base composition in Drosophila , 2004, Genetica.

[21]  N. Sueoka,et al.  CORRELATION BETWEEN BASE COMPOSITION OF DEOXYRIBONUCLEIC ACID AND AMINO ACID COMPOSITION OF PROTEIN. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Mark Gerstein,et al.  Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. , 2002, Nucleic acids research.

[23]  G. Bernardi,et al.  Gene Expression, Amino Acid Conservation, and Hydrophobicity Are the Main Factors Shaping Codon Preferences in Mycobacterium tuberculosis and Mycobacterium leprae , 2000, Journal of Molecular Evolution.

[24]  Toshimichi Ikemura,et al.  Codon usage tabulated from international DNA sequence databases: status for the year 2000 , 2000, Nucleic Acids Res..

[25]  A. Wada,et al.  Compliance of genetic code with base-composition deflecting pressure. , 1992, Advances in Biophysics.

[26]  G. Bernardi,et al.  The human genome: organization and evolutionary history. , 1995, Annual review of genetics.

[27]  T. Porter Correlation between codon usage, regional genomic nucleotide composition, and amino acid composition in the cytochrome P-450 gene superfamily. , 1995, Biochimica et biophysica acta.

[28]  P. Schattner Searching for RNA genes using base-composition statistics. , 2002, Nucleic acids research.