Mutational bias affects protein evolution in flowering plants.

Amino acid sequences from several thousand homologous gene pairs were compared for two plant genomes, Oryza sativa and Arabidopsis thaliana. The Arabidopsis genes all have similar G+C (guanine plus cytosine) contents, whereas their homologs in rice span a wide range of G+C levels. The results show that those rice genes that display increased divergence in their nucleotide composition (specifically, increased G+C content) showed a corresponding, predictable change in the amino acid compositions of the encoded proteins relative to their Arabidopsis homologs. This trend was not seen in a "control" set of rice genes that had nucleotide contents closer to their Arabidopsis homologs. In addition to showing an overall difference in the amino acid composition of the homologous proteins, we were also able to investigate the biased patterns of amino acid substitution since the divergence of these two species. We found that the amino acid exchange matrix was highly asymmetric when comparing the High G+C rice genes with their Arabidopsis homologs. Finally, we investigated the possible causes of this biased pattern of sequence evolution. Our results indicate that the biased pattern of protein evolution is the consequence, rather than the cause, of the corresponding changes in nucleotide content. In fact, there is an even more marked asymmetry in the patterns of substitution at synonymous nucleotide sites. Surprisingly, there is a very strong negative correlation between the level of nucleotide bias and the length of the coding sequences within the rice genome. This difference in gene length may provide important clues about the underlying mechanisms.

[1]  David P. Kreil,et al.  Identification of thermophilic species by the amino acid compositions deduced from their genomes. , 2001, Nucleic acids research.

[2]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[3]  L. Duret,et al.  Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores , 1995, Journal of Molecular Evolution.

[4]  C. Gautier,et al.  Compositional bias in DNA. , 2000, Current opinion in genetics & development.

[5]  J. Lobry Asymmetric substitution patterns in the two DNA strands of bacteria. , 1996, Molecular biology and evolution.

[6]  Xuhua Xia,et al.  Effects of GC Content and Mutational Pressure on the Lengths of Exons and Coding Sequences , 2003, Journal of Molecular Evolution.

[7]  L. Jermiin,et al.  Nucleotide Composition Bias Affects Amino Acid Content in Proteins Coded by Animal Mitochondria , 1997, Journal of Molecular Evolution.

[8]  D. Hickey,et al.  Concerted evolution of duplicated protein-coding genes in Drosophila. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[9]  G. Bernardi,et al.  Compositional Properties of Homologous Coding Sequences from Plants , 1998, Journal of Molecular Evolution.

[10]  M. Van de Casteele,et al.  The role of the codon first letter in the relationship between genomic GC content and protein amino acid composition. , 1999, Research in microbiology.

[11]  N. Galtier Gene conversion drives GC content evolution in mammalian histones. , 2003, Trends in genetics : TIG.

[12]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[13]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[14]  Wei Zhao,et al.  Gramene: a resource for comparative grass genomics , 2002, Nucleic Acids Res..

[15]  B. Morton,et al.  Strand asymmetry and codon usage bias in the chloroplast genome of Euglena gracilis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Lobry,et al.  Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. , 1997, Gene.

[17]  E. Tillier,et al.  The Contributions of Replication Orientation, Gene Direction, and Signal Sequences to Base-Composition Asymmetries in Bacterial Genomes , 2000, Journal of Molecular Evolution.

[18]  G. Bernardi,et al.  Two classes of genes in plants. , 2000, Genetics.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Jun Wang,et al.  Compositional gradients in Gramineae genes. , 2002, Genome research.

[21]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[22]  G. Singer,et al.  Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. , 2000, Molecular biology and evolution.

[23]  S. Karlin,et al.  Comparative DNA analysis across diverse genomes. , 1998, Annual review of genetics.

[24]  T. Jukes,et al.  Relationship between G + C in silent sites of codons and amino acid composition of human proteins , 1993, Journal of Molecular Evolution.