Fugu ESTs: new resources for transcription analysis and genome annotation.

The draft Fugu rubripes genome was released in 2002, at which time relatively few cDNAs were available to aid in the annotation of genes. The data presented here describe the sequencing and analysis of 24,398 expressed sequence tags (ESTs) generated from 15 different adult and juvenile Fugu tissues, 74% of which matched protein database entries. Analysis of the EST data compared with the Fugu genome data predicts that approximately 10,116 gene tags have been generated, covering almost one-third of Fugu predicted genes. This represents a remarkable economy of effort. Comparison with the Washington University zebrafish EST assemblies indicates strong conservation within fish species, but significant differences remain. This potentially represents divergence of sequence in the 5' terminal exons and UTRs between these two fish species, although clearly, complete EST data sets are not available for either species. This project provides new Fugu resources, and the analysis adds significant weight to the argument that EST programs remain an essential resource for genome exploitation and annotation. This is particularly timely with the increasing availability of draft genome sequence from different organisms and the mounting emphasis on gene function and regulation.

[1]  Berthold Göttgens,et al.  Regulation of the stem cell leukemia (SCL) gene: A tale of two fishes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  C. Gissi,et al.  Untranslated regions of mRNAs , 2002, Genome Biology.

[3]  E. Mardis,et al.  An encyclopedia of mouse genes , 1999, Nature Genetics.

[4]  G C Overton,et al.  Analysis of EST-driven gene annotation in human genomic sequence. , 1998, Genome research.

[5]  Michael Ruogu Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2002, Nature Genetics.

[6]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[7]  J. D. Parsons,et al.  Improved tools for DNA comparison and clustering , 1995, Comput. Appl. Biosci..

[8]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[9]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[10]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[11]  Greg Elgar,et al.  Fugu orthologues of human major histocompatibility complex genes: a genome survey , 2002, Immunogenetics.

[12]  D. Goode,et al.  Comparative analysis of vertebrate Shh genes identifies novel conserved non-coding sequence , 2003, Mammalian Genome.

[13]  R Herwig,et al.  An oligonucleotide fingerprint normalized and expressed sequence tag characterized zebrafish cDNA library. , 2001, Genome research.

[14]  C. Fizames,et al.  Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence , 2000, Nature Genetics.

[15]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[16]  K. Maruyama,et al.  Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. , 1994, Gene.

[17]  N. Copeland,et al.  A highly efficient recombineering-based method for generating conditional knockout mutations. , 2003, Genome research.

[18]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[19]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[20]  Y Sakaki,et al.  The DNA sequence of human chromosome 21. , 2000, Nature.

[21]  S. Brenner,et al.  Conserved regulation of the lymphocyte-specific expression of lck in the Fugu and mammals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[23]  R. Guigó,et al.  Computational gene identification , 1997, Journal of Molecular Medicine.

[24]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[25]  Christopher J. Lee,et al.  A genomic view of alternative splicing , 2002, Nature Genetics.

[26]  Erik L. L. Sonnhammer,et al.  A workbench for large-scale sequence homology analysis , 1994, Comput. Appl. Biosci..

[27]  Zhanjiang Liu,et al.  Transcriptome analysis of channel catfish (Ictalurus punctatus): initial analysis of gene expression and microsatellite-containing cDNAs in the skin. , 2002, Gene.

[28]  Kenta Nakai,et al.  DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs , 2002, Nucleic Acids Res..

[29]  M. Hattori,et al.  The DNA sequence of human chromosome 21 , 2000, Nature.

[30]  E. Rothenberg Mapping of complex regulatory elements by pufferfish/zebrafish transgenesis , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  D. Nelson Comparison of P450s from human and fugu: 420 million years of vertebrate P450 evolution. , 2003, Archives of biochemistry and biophysics.

[32]  J. D. Parsons,et al.  Clustering cDNA sequences , 1992, Comput. Appl. Biosci..

[33]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[34]  Wen-chang Lin,et al.  Identification of novel human genes evolutionarily conserved in Caenorhabditis elegans by comparative proteomics. , 2000, Genome research.

[35]  S. Brenner,et al.  Molecular cloning of the pufferfish (Takifugu rubripes) Mx gene and functional characterization of its promoter , 2003, Immunogenetics.

[36]  Alan K. Mackworth,et al.  Improving gene recognition accuracy by combining predictions from two gene-finding programs , 2002, Bioinform..

[37]  Z. Gong Zebrafish expressed sequence tags and their applications. , 1999, Methods in cell biology.

[38]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[39]  M. Dean,et al.  Evolutionary analysis of a cluster of ATP-binding cassette (ABC) genes , 2003, Mammalian Genome.

[40]  M. Boguski,et al.  Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. , 1996, Genome research.

[41]  M. Boguski,et al.  Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[42]  F. Beermann,et al.  Genomic structure and evolutionary conservation of the tyrosinase gene family from Fugu. , 2002, Gene.

[43]  Paul Richardson,et al.  The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins , 2002, Science.

[44]  Paul E. Boardman,et al.  A Comprehensive Collection of Chicken cDNAs , 2002, Current Biology.

[45]  B. Doe,et al.  New 3′ elements control Pax6 expression in the developing pretectum, neural retina and olfactory region , 2002, Mechanisms of Development.

[46]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[47]  Y. Suzuki,et al.  Construction and characterization of a full length-enriched and a 5'-end-enriched cDNA library. , 1997, Gene.

[48]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[49]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[50]  Roderic Guigó,et al.  Computational Gene Identification: An Open Problem , 1997, Comput. Chem..

[51]  N. M. Brooke,et al.  A molecular timescale for vertebrate evolution , 1998, Nature.

[52]  B. Nickel,et al.  Selection on human genes as revealed by comparisons to chimpanzee cDNA. , 2003, Genome research.

[53]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.