The Universal Trend of Amino Acid Gain–Loss is Caused by CpG Hypermutability

Understanding the cause of the changes in the amino acid composition of proteins is essential for understanding the evolution of protein functions. Since the early 1970s, it has been known that the frequency of some amino acids in protein sequences is increasing and that of others is decreasing. Recently, it was found that the trends of amino acid changes were similar in 15 taxa representing Bacteria, Archaea, and Eukaryota. However, the cause of this similarity in the trend of the gains and losses of amino acids continued to be debated. Here, we show that this trend of the gain and loss of amino acids can be simply explained by CpG hypermutability. We found that the frequency of amino acids coded by codons with TpG dinucleotides and those with CpA dinucleotides is increasing, while that of amino acids coded by codons with CpG dinucleotides is decreasing. We also found that organisms that lack DNA methyltransferase show different trends of the gain and loss of amino acids. DNA methyltransferase methylates CpG dinucleotides and induces CpG hypermutability. The incorporation of CpG hypermutability into models of protein evolution will improve studies on protein evolution in different organisms.

[1]  A. Bird DNA methylation and the frequency of CpG in animal DNA. , 1980, Nucleic acids research.

[2]  T. Jukes Codons and nearest-neighbor nucleotide pairs in mammalian messenger RNA , 1978, Journal of Molecular Evolution.

[3]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[4]  Wen-Hsiung Li,et al.  Evolutionary diversification of DNA methyltransferases in eukaryotic genomes. , 2005, Molecular biology and evolution.

[5]  G. Riggins,et al.  Human genes containing polymorphic trinucleotide repeats , 1992, Nature Genetics.

[6]  Gavin A Huttley,et al.  Modeling the impact of DNA methylation on the evolution of BRCA1 in mammals. , 2004, Molecular biology and evolution.

[7]  Richard A Goldstein,et al.  Observations of amino acid gain and loss during protein evolution are explained by statistical bias. , 2006, Molecular biology and evolution.

[8]  John C. Wootton,et al.  Sequences with ‘unusual’ amino acid compositions , 1994 .

[9]  S Tweedie,et al.  Methylation of genomes and genes at the invertebrate-vertebrate boundary , 1997, Molecular and cellular biology.

[10]  E. Koonin,et al.  A universal trend of amino acid gain and loss in protein evolution , 2005, Nature.

[11]  Asger Hobolth,et al.  CpG + CpNpG analysis of protein-coding sequences from tomato. , 2006, Molecular biology and evolution.

[12]  Jotun Hein,et al.  A nucleotide substitution model with nearest-neighbour interactions , 2004, ISMB/ECCB.

[13]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[14]  T. Ohta THE NEARLY NEUTRAL THEORY OF MOLECULAR EVOLUTION , 1992 .

[15]  Ian Holmes,et al.  An empirical codon model for protein sequence evolution. , 2007, Molecular biology and evolution.

[16]  Yong Wang,et al.  Cytosine Methylation Is Not the Major Factor Inducing CpG Dinucleotide Deficiency in Bacterial Genomes , 2004, Journal of Molecular Evolution.

[17]  M. Hattori,et al.  Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS , 2000, Nature.

[18]  Laurence D. Hurst,et al.  Protein evolution: Causes of trends in amino-acid gain and loss , 2006, Nature.

[19]  E Parisi,et al.  The heterogeneity of thymine methyl group origin in DNA pyrimidine isostichs of developing sea urchin embryos. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[20]  J. McDonald,et al.  Apparent trends of amino Acid gain and loss in protein evolution due to nearly neutral variation. , 2006, Molecular biology and evolution.

[21]  H. Vogel,et al.  Mutational trends and random processes in the evolution of informational macromolecules. , 1971, Journal of molecular biology.