Tandem repeat copy-number variation in protein-coding regions of human genes

BackgroundTandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.ResultsProtein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.ConclusionAround 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.

[1]  H. Hamada,et al.  Enhanced gene expression by the poly(dT-dG).poly(dC-dA) sequence , 1984, Molecular and cellular biology.

[2]  J. Rowley,et al.  The human met oncogene is related to the tyrosine kinase oncogenes , 1985, Nature.

[3]  A. Jeffreys,et al.  Spontaneous mutation rates to new length alleles at tandem-repetitive hypervariable loci in human DNA , 1988, Nature.

[4]  Bruce S. Weir,et al.  Genetic Data Analysis: Methods for Discrete Population Genetic Data. , 1991 .

[5]  J. Weber Informativeness of human (dC-dA)n.(dG-dT)n polymorphisms. , 1990, Genomics.

[6]  G. Petersen,et al.  MUC-2 human small intestinal mucin gene structure. Repeated arrays and polymorphism. , 1991, The Journal of clinical investigation.

[7]  K. Fischbeck,et al.  Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy , 1991, Nature.

[8]  J. Sutcliffe,et al.  Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome , 1991, Cell.

[9]  H. A. Yee,et al.  Identification of novel single-stranded d(TC)n binding proteins in several mammalian species. , 1991, Nucleic acids research.

[10]  D. Tautz,et al.  Slippage synthesis of simple sequence DNA. , 1992, Nucleic acids research.

[11]  S. Elgin,et al.  (CT)n (GA)n repeats and heat shock elements have distinct roles in chromatin structure and transcriptional activation of the Drosophila hsp26 gene , 1993, Molecular and cellular biology.

[12]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[13]  Philip M. Murphy,et al.  Molecular mimicry and the generation of host defense protein diversity , 1993, Cell.

[14]  R. Richards,et al.  Fragile X syndrome unstable element, p(CCG)n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins. , 1993, Human molecular genetics.

[15]  O. Onodera,et al.  Unstable expansion of CAG repeat in hereditary dentatorubral–pallidoluysian atrophy (DRPLA) , 1994, Nature Genetics.

[16]  R I Richards,et al.  Simple tandem DNA repeats and human genetic disease. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[17]  L. Hood,et al.  Frequency and polymorphism of simple sequence repeats in a contiguous 685-kb DNA sequence containing the human T-cell receptor beta-chain gene complex. , 1995, Genomics.

[18]  S Karlin,et al.  Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[19]  P. Deloukas,et al.  A Gene Map of the Human Genome , 1996, Science.

[20]  M. Kimmel,et al.  Dynamics of repeat polymorphisms under a forward-backward mutation model: within- and between-population variability at microsatellite loci. , 1996, Genetics.

[21]  K. Kidd,et al.  The world-wide distribution of allele frequencies at the human dopamine D4 receptor locus , 1996, Human Genetics.

[22]  Evolution of hemopoietic ligands and their receptors. Influence of positive selection on correlated replacements throughout ligand and receptor proteins. , 1996, Journal of immunology.

[23]  H. Kawakami,et al.  Molecular features of the CAG repeats of spinocerebellar ataxia 6 (SCA6). , 1997, Human molecular genetics.

[24]  S. Warren,et al.  The effect of FMR1 CGG repeat interruptions on mutation frequency as measured by sperm typing , 2005 .

[25]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[26]  Stanley Letovsky,et al.  GDB: the Human Genome Database , 1998, Nucleic Acids Res..

[27]  W. Kennedy,et al.  Progressive proximal spinal and bulbar muscular atrophy of late onset , 1998, Neurology.

[28]  H R Garner,et al.  Computerized polymorphic marker identification: experimental validation and a predicted human polymorphism catalog. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Alex van Belkum,et al.  Short-Sequence DNA Repeats in Prokaryotic Genomes , 1998, Microbiology and Molecular Biology Reviews.

[30]  B. Dujon,et al.  Double-strand break repair can lead to high frequencies of deletions within short CAG/CTG trinucleotide repeats , 1999, Molecular and General Genetics MGG.

[31]  R. Wells,et al.  Genetic Instabilities in (CTG·CAG) Repeats Occur by Recombination* , 1999, The Journal of Biological Chemistry.

[32]  S Kobayashi,et al.  A neurological disease caused by an expanded CAG trinucleotide repeat in the TATA-binding protein gene: a new polyglutamine disease? , 1999, Human molecular genetics.

[33]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[34]  K. Zänker,et al.  Modulation of Epidermal Growth Factor Receptor Gene Transcription by a Polymorphic Dinucleotide Repeat in Intron 1* , 1999, The Journal of Biological Chemistry.

[35]  J. Steitz,et al.  Human Upf Proteins Target an mRNA for Nonsense-Mediated Decay When Bound Downstream of a Termination Codon , 2000, Cell.

[36]  D. Metzgar,et al.  Selection against frameshift mutations limits microsatellite expansion in coding DNA. , 2000, Genome research.

[37]  A novel variant of the platelet glycoprotein Ibalpha macroglycopeptide region lacks any copies of the "perfect" 13 amino acid repeat. , 2000, Thrombosis and haemostasis.

[38]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[39]  N V Dokholyan,et al.  Distributions of dimeric tandem repeats in non-coding and coding DNA sequences. , 2000, Journal of theoretical biology.

[40]  H R Garner,et al.  Repeat polymorphisms within gene regions: phenotypic and evolutionary implications. , 2000, American journal of human genetics.

[41]  A. Marian,et al.  Human polymorphism of P-selectin glycoprotein ligand 1 attributable to variable numbers of tandem decameric repeats in the mucinlike region. , 2001, Blood.

[42]  John M. Hancock,et al.  A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice. , 2001, Molecular biology and evolution.

[43]  F. Campagne,et al.  TissueInfo: high-throughput identification of tissue expression profiles and specificity. , 2001, Nucleic acids research.

[44]  G. Martens,et al.  Novel Frameshift Mutations near Short Simple Repeats* , 2001, The Journal of Biological Chemistry.

[45]  P. Vieregge,et al.  Different types of repeat expansion in the TATA-binding protein gene are associated with a new form of inherited ataxia , 2001, European Journal of Human Genetics.

[46]  John M. Hancock,et al.  Detecting cryptically simple protein sequences using the SIMPLE algorithm , 2002, Bioinform..

[47]  L. Singh,et al.  Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions , 2003, Genome Biology.

[48]  Jeremy Heil,et al.  Human diallelic insertion/deletion polymorphisms. , 2002, American journal of human genetics.

[49]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[50]  C. Hengstenberg,et al.  Association of Polymorphisms of the Apolipoprotein(a) Gene With Lipoprotein(a) Levels and Myocardial Infarction , 2003, Circulation.

[51]  Bernice R. Packer,et al.  Widespread purifying selection at polymorphic sites in human protein-coding loci , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[52]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[53]  Albrecht Bindereif,et al.  HnRNP L stimulates splicing of the eNOS gene by binding to variable-length CA repeats , 2003, Nature Structural Biology.

[54]  Gilles Vergnaud,et al.  Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains : a web-based resource , 2004, BMC Bioinformatics.

[55]  F. Denoeud,et al.  Predicting human minisatellite polymorphism. , 2003, Genome research.

[56]  P. Stenson,et al.  Human Gene Mutation Database (HGMD , 2003 .

[57]  P. Sylvestre,et al.  Polymorphism in the Collagen-Like Region of the Bacillus anthracis BclA Protein Leads to Variation in Exosporium Filament Length , 2003, Journal of bacteriology.

[58]  N. Saunders,et al.  Diversity in coding tandem repeats in related Neisseria spp. , 2003, BMC Microbiology.

[59]  P. Bugert,et al.  The variable number of tandem repeat polymorphism in the P-selectin glycoprotein ligand-1 gene is not associated with coronary heart disease , 2003, Journal of Molecular Medicine.

[60]  C. Cannon,et al.  Platelet Glycoprotein Ibα Receptor Polymorphisms and Recurrent Ischaemic Events in Acute Coronary Syndrome Patients , 2002, Journal of Thrombosis and Thrombolysis.

[61]  H. Garner,et al.  Molecular origins of rapid and continuous morphological evolution , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Paucimorphic alleles versus polymorphic alleles and rare mutations in disease causation: Theory, observation and detection , 2004 .

[63]  E. Nevo,et al.  Microsatellites within genes: structure, function, and evolution. , 2004, Molecular biology and evolution.

[64]  Damian Smedley,et al.  Ensembl 2005 , 2004, Nucleic Acids Res..

[65]  Alfredo Colosimo,et al.  Structure-Related Statistical Singularities along Protein Sequences: A Correlation Study , 2005, J. Chem. Inf. Model..

[66]  John M. Hancock,et al.  Simple sequence repeats in proteins and their significance for network evolution. , 2005, Gene.

[67]  Niclas Jareborg,et al.  Genome-wide prediction of human VNTRs. , 2005, Genomics.

[68]  Motoo Kimura,et al.  A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population*. , 1973, Genetical research.