A comprehensive transcript index of the human genome generated using microarrays and computational approaches

BackgroundComputational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using ink-jet microarrays. Further, expression activity over at least six conditions was more generally assessed using genomic tiling arrays consisting of probes tiled through a repeat-masked version of the genomic sequence making up chromosomes 20 and 22.ResultsThe combination of microarray data with extensive genome annotations resulted in a set of 28,456 experimentally supported transcripts. This set of high-confidence transcripts represents the first experimentally driven annotation of the human genome. In addition, the results from genomic tiling suggest that a large amount of transcription exists outside of annotated regions of the genome and serves as an example of how this activity could be measured on a genome-wide scale.ConclusionsThese data represent one of the most comprehensive assessments of transcriptional activity in the human genome and provide an atlas of human gene expression over a unique set of gene predictions. Before the annotation of the human genome is considered complete, however, the previously unannotated transcriptional activity throughout the genome must be fully characterized.

[1]  P. Green,et al.  Analysis of expressed sequence tags indicates 35,000 human genes , 2000, Nature Genetics.

[2]  R. Stoughton,et al.  Experimental annotation of the human genome using microarray technology , 2001, Nature.

[3]  A. Sparks,et al.  Using the transcriptome to annotate the genome , 2002, Nature Biotechnology.

[4]  E. Kolker,et al.  Transcriptome analysis of Escherichia coli using high-density oligonucleotide probe arrays. , 2002, Nucleic acids research.

[5]  M. Hattori,et al.  The DNA sequence of human chromosome 21 , 2000, Nature.

[6]  P. Linsley,et al.  Modulation of TCR-induced transcriptional profiles by ligation of CD28, ICOS, and CTLA-4 receptors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Eric E Schadt,et al.  Optimization of oligonucleotide arrays and RNA amplification protocols for analysis of transcript structure and alternative splicing , 2003, Genome Biology.

[8]  R. Treisman,et al.  Spatial flexibility in ternary complexes between SRF and its accessory proteins. , 1992, The EMBO journal.

[9]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[10]  Emmanuel Dias-Neto,et al.  The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[12]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[13]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[14]  D R Bentley,et al.  The DNA sequence and comparative analysis of human chromosome 20 , 2004, Nature.

[15]  Joseph M. Dale,et al.  Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome , 2003, Science.

[16]  N. Nomura,et al.  Complete sequencing and characterization of 21,243 full-length human cDNAs , 2004, Nature Genetics.

[17]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[18]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[19]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[20]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[21]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[22]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[23]  T. Kawamoto,et al.  Identification of the human beta-actin enhancer and its binding factor , 1988, Molecular and cellular biology.

[24]  Ian Dunham,et al.  Reevaluating human gene annotation: a second-generation analysis of chromosome 22. , 2003, Genome research.

[25]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[26]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[27]  J. Steitz,et al.  The expanding universe of noncoding RNAs. , 2006, Cold Spring Harbor symposia on quantitative biology.

[28]  R. Frederickson,et al.  5' flanking and first intron sequences of the human beta-actin gene required for efficient promoter activity. , 1989, Nucleic acids research.

[29]  John Quackenbush,et al.  Gene Index analysis of the human genome estimates approximately 120,000 genes , 2000, Nature Genetics.

[30]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[31]  L. Pachter,et al.  rVista for comparative sequence-based discovery of functional transcription factor binding sites. , 2002, Genome research.

[32]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[34]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[35]  J. Rinn,et al.  The transcriptional activity of human Chromosome 22. , 2003, Genes & development.

[36]  Al Stutz,et al.  A draft annotation and overview of the human genome , 2001, Genome Biology.

[37]  Edward C. Uberbacher,et al.  Automated Gene Identification in Large-Scale Genomic Sequences , 1997, J. Comput. Biol..

[38]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[39]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[40]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[41]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[42]  S. Cawley,et al.  Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. , 2004, Genome research.

[43]  D. Davison,et al.  d2_cluster: a validated method for clustering EST and full-length cDNAsequences. , 1999, Genome research.

[44]  Yudong D. He,et al.  Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer , 2001, Nature Biotechnology.

[45]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[46]  T. Hughes,et al.  Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. , 2000, Science.