Amino acid runs in eukaryotic proteomes and disease associations

We present a comparative proteome analysis of the five complete eukaryotic genomes (human, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana), focusing on individual and multiple amino acid runs, charge and hydrophobic runs. We found that human proteins with multiple long runs are often associated with diseases; these include long glutamine runs that induce neurological disorders, various cancers, categories of leukemias (mostly involving chromosomal translocations), and an abundance of Ca2 + and K+ channel proteins. Many human proteins with multiple runs function in development and/or transcription regulation and are Drosophila homeotic homologs. A large number of these proteins are expressed in the nervous system. More than 80% of Drosophila proteins with multiple runs seem to function in transcription regulation. The most frequent amino acid runs in Drosophila sequences occur for glutamine, alanine, and serine, whereas human sequences highlight glutamate, proline, and leucine. The most frequent runs in yeast are of serine, glutamine, and acidic residues. Compared with the other eukaryotic proteomes, amino acid runs are significantly more abundant in the fly. This finding might be interpreted in terms of innate differences in DNA-replication processes, repair mechanisms, DNA-modification systems, and mutational biases. There are striking differences in amino acid runs for glutamine, asparagine, and leucine among the five proteomes.

[1]  D. Hogness,et al.  The units of DNA replication in Drosophila melanogaster chromosomes. , 1974, Cold Spring Harbor symposia on quantitative biology.

[2]  R. Ben-Shlomo,et al.  The Evolutionary Significance of Genetic Diversity: Ecological, Demographic and Life History Correlates , 1984 .

[3]  J. Richardson,et al.  Amino acid preferences for specific locations at the ends of alpha helices. , 1988, Science.

[4]  S. Karlin,et al.  Quantile distributions of amino acid usage in protein classes. , 1992, Protein engineering.

[5]  E. Roscher,et al.  Genotoxicity of 1,3- and 1,6-dinitropyrene: induction of micronuclei in a panel of mammalian test cell lines. , 1992, Mutation research.

[6]  P. Lohman,et al.  Neither enhanced removal of cyclobutane pyrimidine dimers nor strand-specific repair is found after transcription induction of the beta 3-tubulin gene in a Drosophila embryonic cell line Kc. , 1992, Mutation research.

[7]  H. Green Human genetic diseases due to codon reiteration: Relationship to an evolutionary mechanism , 1993, Cell.

[8]  C. Hunter,et al.  Sequence-dependent DNA structure. The role of base stacking interactions. , 1993, Journal of molecular biology.

[9]  J T Finch,et al.  Glutamine repeats as polar zippers: their possible role in inherited neurodegenerative diseases. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[10]  R. Richards,et al.  The molecular basis of fragile sites in human chromosomes. , 1995, Current opinion in genetics & development.

[11]  S. Karlin,et al.  Statistical significance of sequence patterns in proteins. , 1995, Current opinion in structural biology.

[12]  S Karlin,et al.  Clusters of charged residues in protein three-dimensional structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  S Karlin,et al.  Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Banchereau,et al.  Dendritic cells capable of stimulating T cells in germinal centres , 1996, Nature.

[15]  D. Petrov,et al.  High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups. , 1998, Molecular biology and evolution.

[16]  Martin Biel,et al.  Two pacemaker channels from human heart with profoundly different activation kinetics , 1999, The EMBO journal.

[17]  A. Rolfs,et al.  Metal ion transporters in mammals: structure, function and pathological implications , 1999, The Journal of physiology.

[18]  H. Paulson,et al.  Analysis of the Role of Heat Shock Protein (Hsp) Molecular Chaperones in Polyglutamine Disease , 1999, The Journal of Neuroscience.

[19]  Max F. Perutz,et al.  Glutamine repeats and neurodegenerative diseases: molecular aspects. , 1999, Trends in biochemical sciences.

[20]  H. Paulson,et al.  Suppression of polyglutamine-mediated neurodegeneration in Drosophila by the molecular chaperone HSP70 , 1999, Nature Genetics.

[21]  E. Koonin,et al.  The alpha/beta fold uracil DNA glycosylases: a common origin with diverse fates , 2000, Genome Biology.

[22]  H. Zoghbi,et al.  Trinucleotide repeats: mechanisms and pathophysiology. , 2000, Annual review of genomics and human genetics.

[23]  J. Weissman,et al.  A census of glutamine/asparagine-rich regions: implications for their conserved function and the prediction of novel prions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  David P. Kreil,et al.  Asparagine repeats are rare in mammalian proteins. , 2000, Trends in biochemical sciences.

[25]  R. Kopito,et al.  Impairment of the ubiquitin-proteasome system by protein aggregation. , 2001, Science.

[26]  V. Wee Yong,et al.  Metalloproteinases in biology and pathology of the nervous system , 2001, Nature Reviews Neuroscience.

[27]  Richard R. Sinden,et al.  Neurodegenerative diseases: Origins of instability , 2001, Nature.

[28]  I. V. Kovtun,et al.  Structural features of trinucleotide repeats associated with DNA expansion. , 2001, Biochemistry and cell biology = Biochimie et biologie cellulaire.