Shotgun sequencing of the human transcriptome with ORF expressed sequence tags.

Theoretical considerations predict that amplification of expressed gene transcripts by reverse transcription-PCR using arbitrarily chosen primers will result in the preferential amplification of the central portion of the transcript. Systematic, high-throughput sequencing of such products would result in an expressed sequence tag (EST) database consisting of central, generally coding regions of expressed genes. Such a database would add significant value to existing public EST databases, which consist mostly of sequences derived from the extremities of cDNAs, and facilitate the construction of contigs of transcript sequences. We tested our predictions, creating a database of 10,000 sequences from human breast tumors. The data confirmed the central distribution of the sequences, the significant normalization of the sequence population, the frequent extension of contigs composed of existing human ESTs, and the identification of a series of potentially important homologues of known genes. This approach should make a significant contribution to the early identification of important human genes, the deciphering of the draft human genome sequence currently being compiled, and the shotgun sequencing of the human transcriptome.

[1]  M. McClelland,et al.  DNA rehybridization during PCR: the 'Cot effect' and its consequences. , 1996, Nucleic acids research.

[2]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[3]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.

[4]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[5]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[6]  Michael N. Edmonson,et al.  Reliable identification of large numbers of candidate SNPs from public EST data , 1999, Nature Genetics.

[7]  D B Davison,et al.  Alternative gene form discovery and candidate gene selection from gene indexing projects. , 1998, Genome research.

[8]  Roger E Bumgarner,et al.  An expressed-sequence-tag database of the human prostate: sequence analysis of 1168 cDNA clones. , 1998, Genomics.

[9]  H. Jacob,et al.  EbEST: an automated tool using expressed sequence tags to delineate gene structure. , 1998, Genome research.

[10]  Juri Rappsilber,et al.  Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex , 1998, Nature Genetics.

[11]  Carol A. Dahl,et al.  New opportunities for uncovering the molecular basis of cancer , 1997, Nature Genetics.

[12]  P Jay,et al.  Isolation and regional mapping of cDNAs expressed during early human development. , 1997, Genomics.

[13]  I. Pastan,et al.  Discovery of three genes specifically expressed in human prostate by expressed sequence tag database analysis. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  L. Hood,et al.  Prostate cancer expression profiling by cDNA sequencing analysis. , 1999, Genomics.

[15]  A J Simpson,et al.  Minilibraries constructed from cDNA generated by arbitrarily primed RT-PCR: an alternative to normalized libraries for the generation of ESTs from nanogram quantities of mRNA. , 1997, Gene.

[16]  F. Lewitter,et al.  Nucleotide sequence databases: a gold mine for biologists. , 1999, Trends in biochemical sciences.

[17]  J. Craig Venter,et al.  Sequence identification of 2,375 human brain genes , 1992, Nature.

[18]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[19]  G C Overton,et al.  Analysis of EST-driven gene annotation in human genomic sequence. , 1998, Genome research.

[20]  H. Friess,et al.  A pancreatic cancer-specific expression profile. , 1996, Oncogene.