Experimental annotation of the human genome using microarray technology

The most important product of the sequencing of a genome is a complete, accurate catalogue of genes and their products, primarily messenger RNA transcripts and their cognate proteins. Such a catalogue cannot be constructed by computational annotation alone; it requires experimental validation on a genome scale. Using ‘exon’ and ‘tiling’ arrays fabricated by ink-jet oligonucleotide synthesis, we devised an experimental approach to validate and refine computational gene predictions and define full-length transcripts on the basis of co-regulated expression of their exons. These methods can provide more accurate gene numbers and allow the detection of mRNA splice variants and identification of the tissue- and disease-specific conditions under which genes are expressed. We apply our technique to chromosome 22q under 69 experimental condition pairs, and to the entire human genome under two experimental conditions. We discuss implications for more comprehensive, consistent and reliable genome annotation, more efficient, full-length complementary DNA cloning strategies and application to complex diseases.

[1]  P. Bork,et al.  Alternative splicing of human genes: more the rule than the exception? , 1999, Trends in genetics : TIG.

[2]  C. Fizames,et al.  Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence , 2000, Nature Genetics.

[3]  M S Boguski,et al.  Late-night thoughts on the sequence annotation problem. , 1998, Genome research.

[4]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[5]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[6]  P. Green,et al.  Analysis of expressed sequence tags indicates 35,000 human genes , 2000, Nature Genetics.

[7]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[8]  David K. Hanzel,et al.  Mining the human genome using microarrays of open reading frames , 2000, Nature Genetics.

[9]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[10]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[11]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[12]  T. Hughes,et al.  Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. , 2000, Science.

[13]  D. Black Protein Diversity from Alternative Splicing A Challenge for Bioinformatics and Post-Genome Biology , 2000, Cell.

[14]  P. Brown,et al.  Drug target validation and identification of secondary drug target effects using DNA microarrays , 1998, Nature Medicine.

[15]  M. Boguski,et al.  Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[17]  M S Boguski,et al.  Biosequence exegesis. , 1999, Science.

[18]  John Quackenbush,et al.  Gene Index analysis of the human genome estimates approximately 120,000 genes , 2000, Nature Genetics.

[19]  A. Blanchard,et al.  High-density oligonucleotide arrays , 1996 .

[20]  Jill P. Mesirov,et al.  Human and mouse gene structure: comparative analysis and application to exon prediction , 2000, RECOMB '00.

[21]  F F Costa,et al.  Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[23]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[24]  E. Birney,et al.  Open annotation offers a democratic solution to genome sequencing , 2000, Nature.

[25]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[26]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.

[27]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[28]  E. Marshall Public-Private Project to Deliver Mouse Genome in 6 Months , 2000, Science.

[29]  M. Boguski,et al.  Biosequence exegesis : Genome , 1999 .