Functional annotation of a full-length mouse cDNA collection

The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning by biotinylated CAP trapper. , 1996, Genomics.

[3]  M. Adams,et al.  A tool for analyzing and annotating genomic sequences. , 1997, Genomics.

[4]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[5]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[6]  N Sasaki,et al.  Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S Audic,et al.  Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. , 1998, Genome research.

[8]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[9]  J. Kawai,et al.  Automated filtration-based high-throughput plasmid preparation system. , 1999, Genome research.

[10]  P. Bork,et al.  Alternative splicing of human genes: more the rule than the exception? , 1999, Trends in genetics : TIG.

[11]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning. , 1999, Methods in enzymology.

[12]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[13]  Piero Carninci,et al.  Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. , 2000, Genome research.

[14]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[15]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[16]  P. Green,et al.  Analysis of expressed sequence tags indicates 35,000 human genes , 2000, Nature Genetics.

[17]  M. R. Adams,et al.  Comparative genomics of the eukaryotes. , 2000, Science.

[18]  E. Koonin,et al.  SAP - a putative DNA-binding motif involved in chromosomal organization. , 2000, Trends in biochemical sciences.

[19]  C. Fizames,et al.  Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence , 2000, Nature Genetics.

[20]  Hideo Matsuda Detection of Conserved Domains in Protein Sequences Using a Maximum-Density Subgraph Alogrithm , 2000 .

[21]  Graziano Pesole,et al.  PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance , 2000, Bioinform..

[22]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[23]  Kevin Burrage,et al.  ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome , 2000, Nature Genetics.

[24]  John Quackenbush,et al.  Gene Index analysis of the human genome estimates approximately 120,000 genes , 2000, Nature Genetics.

[25]  Piero Carninci,et al.  RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer. , 2000, Genome research.

[26]  Y. Hayashizaki,et al.  Amino acid translation program for full-length cDNA sequences with frameshift errors. , 2001, Physiological genomics.