Integrated analysis of the genome and the transcriptome by FANTOM

The key to reliable annotation of a mammalian genome is broad characterisation of the transcriptional output, the transcriptome. FANTOM, the functional annotation of mouse cDNA, is a large-scale analysis of both the genome and the transcriptome of the mouse. In the early days of this work, the transcripts were characterised using our sophisticated methods. After the timely release of the first draft of mouse genome sequences, interesting information was obtained by its integration with these one-by-one annotations. Moreover, each transcript included its expression profile. Here, the two integrated annotation methods used by FANTOM are reviewed: one-by-one and categorised. One-by-one annotation refers to naming carried out based on well-known transcripts or its fragments using the top-down-style pipeline developed mostly by the FANTOM project. Categorised annotation, which refers to transcript grouping, not only helps naming of unknown transcripts, but will be the most utilised method for integration of the genome and the transcriptome from now on.

[1]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[2]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[3]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): integrating biology with the genome , 2004, Nucleic Acids Res..

[4]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[5]  Martin Ringwald,et al.  Connecting sequence and biology in the laboratory mouse. , 2003, Genome research.

[6]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[7]  Peter B. McGarvey,et al.  The Protein Information Resource (PIR) , 2000, Nucleic Acids Res..

[8]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.

[9]  Yoshihide Hayashizaki,et al.  Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. , 2003, Genome research.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  C. V. Jongeneel,et al.  ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences , 1999, ISMB.

[12]  C. Vaquero,et al.  Do natural antisense transcripts make sense in eukaryotes? , 1998, Gene.

[13]  M. Fagiolini,et al.  Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. , 2003, Genome research.

[14]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[15]  Terry Gaasterland,et al.  Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. , 2003, Genome research.

[16]  Sumio Sugano,et al.  A transcription factor response element for gene expression during circadian night , 2002, Nature.

[17]  Zheng Yuan,et al.  The mouse secretome: functional classification of the proteins secreted into the extracellular environment. , 2003, Genome research.

[18]  Yoshihide Hayashizaki,et al.  CDS annotation in full-length cDNA sequence. , 2003, Genome research.

[19]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[20]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J A Blake,et al.  Program description: Strategies for biological annotation of mammalian systems: implementing gene ontologies in mouse genome informatics. , 2001, Genomics.

[22]  Y. Hayashizaki,et al.  Amino acid translation program for full-length cDNA sequences with frameshift errors. , 2001, Physiological genomics.

[23]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[24]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[25]  Yoshihide Hayashizaki,et al.  Discovery of imprinted transcripts in the mouse transcriptome using large-scale expression profiling. , 2003, Genome research.

[26]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[27]  Y. Matsuo,et al.  Exploration of novel motifs derived from mouse cDNA sequences. , 2002, Genome research.

[28]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[29]  M. Tomita,et al.  Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. , 2003, Genome research.

[30]  R. Stoughton,et al.  Experimental annotation of the human genome using microarray technology , 2001, Nature.

[31]  Cyrus Chothia,et al.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments , 2002, Nucleic Acids Res..

[32]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[33]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[34]  John Quackenbush,et al.  TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets , 2003, Bioinform..

[35]  Jia Ye,et al.  Vertebrate gene predictions and the problem of large genes , 2003, Nature Reviews Genetics.

[36]  Y. Hayashizaki,et al.  Systematic expression profiling of the mouse transcriptome using RIKEN cDNA microarrays. , 2003, Genome research.

[37]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[38]  Hideo Matsuda,et al.  Development and evaluation of an automated annotation pipeline and cDNA annotation system. , 2003, Genome research.

[39]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..