21. UniGene: A Unified View of the Transcriptome

The task of assembling an inventory of all genes of Homo sapiens and other organisms began more than a decade ago with large-scale survey sequencing of transcribed sequences. The resulting Expressed Sequence Tags (ESTs) were a gold mine of novel gene sequences that provided an infrastructure for additional large-scale projects, such as gene maps, expression systems, and fulllength cDNA projects. In addition, untold numbers of targeted gene-hunting projects have benefited from the availability of these sequences and the physical clone reagents. However, the high level of redundancy found among transcribed sequences, not to mention a variety of common experimental artifacts, made it difficult for many people to make effective use of the data. This problem was the motivation for the development of UniGene [http://www.ncbi.nlm.nih.gov/UniGene/], a largely automated analytical system for producing an organized view of the transcriptome. In this chapter, we discuss the properties of the input sequences, the process by which they are analyzed in UniGene, and some pointers on how to use the resource.

[1]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[2]  Kousaku Okubo,et al.  Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression , 1992, Nature Genetics.

[3]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[4]  J. Craig Venter,et al.  Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library , 1993, Nature Genetics.

[5]  C. Auffray,et al.  Finding new genes faster than ever , 1993, Nature Genetics.

[6]  M S Boguski,et al.  Gene discovery in dbEST. , 1994, Science.

[7]  M. Soares,et al.  Construction and characterization of a normalized cDNA library. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[8]  S. Bentolila,et al.  The Genexpress Index: a resource for gene discovery and the genic map of the human genome. , 1995, Genome research.

[9]  M. Soares,et al.  Normalization and subtraction: two approaches to facilitate gene discovery. , 1996, Genome research.

[10]  E. Wahle,et al.  The biochemistry of polyadenylation. , 1996, Trends in biochemical sciences.

[11]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[12]  C. Auffray,et al.  The I.M.A.G.E. Consortium: an integrated molecular analysis of genomes and their expression. , 1996, Genomics.

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  R. Strausberg,et al.  The Cancer Genome Anatomy Project: EST sequencing and the genetics of cancer progression. , 1999, Neoplasia.

[15]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[16]  Jennifer Daub,et al.  Expressed sequence tags: medium-throughput protocols. , 2004, Methods in molecular biology.

[17]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..