Satellog: A database for the identification and prioritization of satellite repeats in disease association studies

BackgroundTo date, 35 human diseases, some of which also exhibit anticipation, have been associated with unstable repeats. Anticipation has been reported in a number of diseases in which repeat expansion may have a role in etiology. Despite the growing importance of unstable repeats in disease, currently no resource exists for the prioritization of repeats. Here we present Satellog, a database that catalogs all pure 1–16 repeat unit satellite repeats in the human genome along with supplementary data. Satellog analyzes each pure repeat in UniGene clusters for evidence of repeat polymorphism.ResultsA total of 5,546 such repeats were identified, providing the first indication of many novel polymorphic sites in the genome. Overall, polymorphic repeats were over-represented within 3'-UTR sequence relative to 5'-UTR and coding sequence. Interestingly, we observed that repeat polymorphism within coding sequence is restricted to trinucleotide repeats whereas UTR sequence tolerated a wider range of repeat period polymorphisms. For each pure repeat we also calculate its repeat length percentile rank, its location either within or adjacent to EnsEMBL genes, and its expression profile in normal tissues according to the GeneNote database.ConclusionSatellog provides the ability to dynamically prioritize repeats based on any of their characteristics (i.e. repeat unit, class, period, length, repeat length percentile rank, genomic co-ordinates), polymorphism profile within UniGene, proximity to or presence within gene regions (i.e. cds, UTR, 15 kb upstream etc.), metadata of the genes they are detected within and gene expression profiles within normal human tissues. Unstable repeats associated with 31 diseases were analyzed in Satellog to evaluate their common repeat properties. The utility of Satellog was highlighted by prioritizing repeats for Huntington's disease and schizophrenia. Satellog is available online at http://satellog.bcgsc.ca.

[1]  C. Ross,et al.  Pathogenesis of neurodegenerative diseases associated with expanded glutamine repeats: new answers, new questions. , 1998, Progress in brain research.

[2]  H. Zoghbi,et al.  Evidence for a mechanism predisposing to intergenerational CAG repeat instability in spinocerebellar ataxia type I , 1993, Nature Genetics.

[3]  L. Bellodi,et al.  Anticipation of age at onset in panic disorder. , 1998, The American journal of psychiatry.

[4]  William B. Dobyns,et al.  Autosomal dominant cerebellar ataxia (SCA6) associated with small polyglutamine expansions in the α1A-voltage-dependent calcium channel , 1997, Nature Genetics.

[5]  A. Bassett,et al.  Anticipation or ascertainment bias in schizophrenia? Penrose's familial mental illness sample. , 1997, American journal of human genetics.

[6]  Hans Lehrach,et al.  The Huntington's disease candidate region exhibits many different haplotypes , 1992, Nature Genetics.

[7]  L. Singh,et al.  MRD: a microsatellite repeats database for prokaryotic and eukaryotic genomes , 2002, Genome Biology.

[8]  M. Pericak-Vance,et al.  Evidence for anticipation in autosomal dominant limb-girdle muscular dystrophy. , 1998, Journal of medical genetics.

[9]  K. Ohara,et al.  Anticipation and imprinting in Japanese familial mood disorders , 1998, Psychiatry Research.

[10]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[11]  Y. Agid,et al.  Cloning of the SCA7 gene reveals a highly unstable CAG repeat expansion , 1997, Nature Genetics.

[12]  D. Schlessinger,et al.  The putative forkhead transcription factor FOXL2 is mutated in blepharophimosis/ptosis/epicanthus inversus syndrome , 2001, Nature Genetics.

[13]  Takanori Yamagata,et al.  Large expansion of the ATTCT pentanucleotide repeat in spinocerebellar ataxia type 10 , 2000, Nature Genetics.

[14]  M. D. Del Bigio,et al.  Neuronal intranuclear inclusions in a new cerebellar tremor/ataxia syndrome among fragile X carriers. , 2002, Brain : a journal of neurology.

[15]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[16]  D. Housman,et al.  Sequential strategy to identify a susceptibility gene for schizophrenia: report of potential linkage on chromosome 22q12-q13.1: Part 1. , 1994, American journal of medical genetics.

[17]  C. E. Pearson,et al.  The contribution of cis-elements to disease-associated repeat instability: clinical and experimental evidence , 2003, Cytogenetic and Genome Research.

[18]  K. Fischbeck,et al.  Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy , 1991, Nature.

[19]  David E. Housman,et al.  Molecular basis of myotonic dystrophy: Expansion of a trinucleotide (CTG) repeat at the 3′ end of a transcript encoding a protein kinase family member , 1992, Cell.

[20]  H. Zoghbi,et al.  Gametic and somatic tissue–specific heterogeneity of the expanded SCA1 CAG repeat in spinocerebellar ataxia type 1 , 1995, Nature Genetics.

[21]  S. Warren,et al.  Cryptic and polar variation of the fragile X repeat could result in predisposing normal alleles , 1994, Cell.

[22]  T. Crow,et al.  A genome-wide search for schizophrenia susceptibility genes. , 1998, American journal of medical genetics.

[23]  S. Narumiya,et al.  Expanded polyglutamine in the Machado–Joseph disease protein induces cell death in vitro and in vivo , 1996, Nature Genetics.

[24]  M. Swanson,et al.  Myotonic dystrophy type 1 is associated with nuclear foci of mutant RNA, sequestration of muscleblind proteins and deregulated alternative splicing in neurons. , 2004, Human molecular genetics.

[25]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[26]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[27]  Bill Long,et al.  An exhaustive DNA micro-satellite map of the human genome using high performance computing. , 2003, Genomics.

[28]  P. Patel,et al.  Friedreich's Ataxia: Autosomal Recessive Disease Caused by an Intronic GAA Triplet Repeat Expansion , 1996, Science.

[29]  O. Onodera,et al.  Unstable expansion of CAG repeat in hereditary dentatorubral–pallidoluysian atrophy (DRPLA) , 1994, Nature Genetics.

[30]  Y. Kashi,et al.  Simple sequence repeats as a source of quantitative genetic variation. , 1997, Trends in genetics : TIG.

[31]  G. Dalgliesh,et al.  Expansion of GAA triplet repeats in the human genome: unique origin of the FRDA mutation at the center of an Alu. , 2004, Genomics.

[32]  Alain Malafosse,et al.  Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy , 1997, Nature.

[33]  J. Sutcliffe,et al.  Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome , 1991, Cell.

[34]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[35]  C. Deighton,et al.  Further evidence for genetic anticipation in familial rheumatoid arthritis. , 1996, Annals of the rheumatic diseases.

[36]  H. Coon,et al.  Genomic scan for genes predisposing to schizophrenia. , 1994, American journal of medical genetics.

[37]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[38]  L. Singh,et al.  Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions , 2003, Genome Biology.

[39]  Yves Agid,et al.  Cloning of the gene for spinocerebellar ataxia 2 reveals a locus with high sensitivity to expanded CAG/glutamine repeats , 1996, Nature Genetics.

[40]  R I Richards,et al.  Mapping of DNA instability at the fragile X to a trinucleotide repeat sequence p(CCG)n , 1991, Science.

[41]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[42]  BMC Bioinformatics , 2005 .

[43]  T. Bayless,et al.  GENETIC ANTICIPATION IN CROHN'S DISEASE , 1998, American Journal of Gastroenterology.

[44]  E. Goode,et al.  Anticipation in familial leukemia. , 1996, American journal of human genetics.

[45]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[46]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[47]  W. Honer,et al.  Evidence for anticipation in schizophrenia. , 1994, American journal of human genetics.

[48]  M. Doherty,et al.  Evidence for genetic anticipation in nodal osteoarthritis , 1998, Annals of the rheumatic diseases.

[49]  M. Leppert,et al.  A gene for familial total anomalous pulmonary venous return maps to chromosome 4p13-q12. , 1995, American journal of human genetics.

[50]  Harry T. Orr,et al.  Identification and characterization of the gene causing type 1 spinocerebellar ataxia , 1994, Nature Genetics.

[51]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[52]  W. Reardon,et al.  Anticipation in myotonic dystrophy: new light on an old problem. , 1992, American journal of human genetics.

[53]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[54]  Doron Lancet,et al.  GeneNote: whole genome expression profiles in normal human tissues. , 2003, Comptes rendus biologies.

[55]  H. Zoghbi,et al.  Trinucleotide repeats: mechanisms and pathophysiology. , 2000, Annual review of genomics and human genetics.

[56]  G. Meco,et al.  Anticipation of onset age in familial Parkinson's disease , 1994, Neurology.

[57]  K. Murphy,et al.  Schizophrenia and velo-cardio-facial syndrome , 2002, The Lancet.

[58]  K. Ohara,et al.  Age of onset anticipation in anxiety disorders , 1999, Psychiatry Research.