From masking repeats to identifying functional repeats in the mouse transcriptome

The back-to-back release of the mouse genome and the functionally annotated RIKEN mouse full-length cDNA collection was an important milestone in mammalian genomics. Yet much of the data remain to be explored in terms of biological effects and mechanisms. For example, interspersed repeats account for 39 per cent of the mouse genome sequence and 11 per cent of representative transcripts. A considerable number of transposable repeat elements are still active and propagating in mouse compared with human. While existing repeat databases and tools assist the classification of repeats or identification of new repeats, there is little bioinformatic support towards exploring the extent and role of repeats in transcriptional variation, modulation of protein function, or gene regulatory events. Since the mouse is used as a model organism to study human genes and their disease associations, this review focuses on information extraction and collation that captures the functional context of repeats in mouse transcripts to facilitate the biological interpretation and extrapolation of findings to the human.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  M. Pandolfo,et al.  Sticky DNA: self-association properties of long GAA.TTC repeats in R.R.Y triplex structures from Friedreich's ataxia. , 1999, Molecular cell.

[3]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[4]  C. Ware,et al.  ΔBAFF, an Alternate Splice Isoform That Regulates Receptor Binding and Biopresentation of the B Cell Survival Cytokine, BAFF* , 2003, Journal of Biological Chemistry.

[5]  J. Jurka,et al.  Microsatellites in different eukaryotic genomes: survey and analysis. , 2000, Genome research.

[6]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[7]  Bill Long,et al.  An exhaustive DNA micro-satellite map of the human genome using high performance computing. , 2003, Genomics.

[8]  J. Squire,et al.  The role of Alu repeat clusters as mediators of recurrent chromosomal aberrations in tumors , 2002, Genes, chromosomes & cancer.

[9]  Terry Gaasterland,et al.  Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. , 2003, Genome research.

[10]  M. Litt,et al.  A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. , 1989, American journal of human genetics.

[11]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[12]  Jerzy Jurka,et al.  HERVd: the Human Endogenous RetroViruses Database: update , 2004, Nucleic Acids Res..

[13]  M. Menon,et al.  Effect of a short CAG (glutamine) repeat on human androgen receptor function , 2004, The Prostate.

[14]  I. Komuro,et al.  The Polycomb-group gene Rae28 sustains Nkx2.5/Csx expression and is essential for cardiac morphogenesis. , 2002, The Journal of clinical investigation.

[15]  Diego G. Silva,et al.  Inferring higher functional information for RIKEN mouse full-length cDNA clones with FACTS. , 2003, Genome research.

[16]  E. Ostertag,et al.  Biology of mammalian L1 retrotransposons. , 2001, Annual review of genetics.

[17]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[18]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[19]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[20]  A. Spada,et al.  Polyglutamines Placed into Context , 2003, Neuron.

[21]  Robert I. Richards,et al.  Simple repeat DNA is not replicated simply , 1994, Nature Genetics.

[22]  C. Caskey,et al.  DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. , 1991, American journal of human genetics.

[23]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[24]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[25]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[26]  D. Acosta,et al.  Metabolism of ketoconazole and deacetylated ketoconazole by rat hepatic microsomes and flavin-containing monooxygenases. , 1997, Drug metabolism and disposition: the biological fate of chemicals.

[27]  Haig H. Kazazian,et al.  Mobile elements and the human genome , 2000, Nature Reviews Genetics.

[28]  Akihiko Konagaya,et al.  FREP: a database of functional repeats in mouse cDNAs , 2004, Nucleic Acids Res..

[29]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[30]  J. Ortonne,et al.  Transposable B2 SINE elements can provide mobile RNA polymerase II promoters , 2001, Nature Genetics.

[31]  J. Epplen,et al.  On GATAGATA and other “junk” in the barren stretch of genomic desert , 1998, Cytogenetic and Genome Research.

[32]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[33]  G. Edwalds-Gilbert,et al.  Alternative poly(A) site selection in complex transcription units: means to an end? , 1997, Nucleic acids research.

[34]  Dan Graur,et al.  kinase localization of a novel catalytic subunit of casein Translated Alu sequence determines nuclear , 2002 .

[35]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[36]  Jorng-Tzong Horng,et al.  Database of repetitive elements in complete genomes and data mining using transcription factor binding sites , 2003, IEEE Transactions on Information Technology in Biomedicine.

[37]  J. Jurka,et al.  Repeats in genomic DNA: mining and meaning. , 1998, Current opinion in structural biology.