FREP: a database of functional repeats in mouse cDNAs

The FREP database (http://facts.gsc.riken.go.jp/FREP/) contains 31 396 RepeatMasker-identified non-redundant variant repeat sequences derived from 16,527 mouse cDNAs with protein-coding potential. The repeats were computationally associated with potential effects on transcriptional variation, translation, protein function or involvement in disease to identify Functional REPeats (FREPs). FREPs are defined by the (i) occurrence of exon-exon boundaries in repeats, (ii) presence of polyadenylation sites in 3'UTR-located repeats, (iii) effect on translation, (iv) position in the protein- coding region or protein domains or (v) conditional association with disease MeSH terms. Currently the database contains 9261 (29.5%) inferred FREPs derived from 6861 (41.5%) mouse cDNAs. Integrated evidence of the functional assignments and dynamically generated sequence similarity search results support the exploration and annotation of functional, ancestral or taxon-specific repeats. Keyword and pre-selected feature searches (e.g. coding sequence-repeat or splice site-repeat relations) support intuitive database querying as well as the retrieval of repeat sequences. Integrated sequence search and alignment tools allow the analysis of known or identification of new functional repeat candidates. FREP is a unique resource for illuminating the role of transposons and repetitive sequences in shaping the coding part of the mouse transcriptome and for selecting the appropriate experimental model to study diseases with suspected repeat etiology contributions.

[1]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[2]  Tsviya Olender,et al.  GeneCardsTM 2002: towards a complete, object-oriented, human gene compendium , 2002, Bioinform..

[3]  Robert I. Richards,et al.  Simple repeat DNA is not replicated simply , 1994, Nature Genetics.

[4]  Jia Liu,et al.  The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists , 2003, Nucleic Acids Res..

[5]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[6]  G. Edwalds-Gilbert,et al.  Alternative poly(A) site selection in complex transcription units: means to an end? , 1997, Nucleic acids research.

[7]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[8]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[9]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[10]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[11]  Terry Gaasterland,et al.  Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. , 2003, Genome research.

[12]  J. Epplen,et al.  On GATAGATA and other “junk” in the barren stretch of genomic desert , 1998, Cytogenetic and Genome Research.

[13]  John M. Butler,et al.  STRBase: a short tandem repeat DNA database for the human identity testing community , 2001, Nucleic Acids Res..

[14]  Haig H. Kazazian,et al.  Mobile elements and the human genome , 2000, Nature Reviews Genetics.

[15]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[16]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[17]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[18]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[19]  Jorng-Tzong Horng,et al.  The Repetitive Sequence Database and Mining Putative Regulatory Elements in Gene Promoter Regions , 2002, J. Comput. Biol..

[20]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[21]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[22]  Diego G. Silva,et al.  Inferring higher functional information for RIKEN mouse full-length cDNA clones with FACTS. , 2003, Genome research.

[23]  J. V. Moran,et al.  ATLAS: a system to selectively identify human-specific L1 insertions. , 2003, American journal of human genetics.