The significance of digital gene expression profiles.

Genes differentially expressed in different tissues, during development, or during specific pathologies are of foremost interest to both basic and pharmaceutical research. "Transcript profiles" or "digital Northerns" are generated routinely by partially sequencing thousands of randomly selected clones from relevant cDNA libraries. Differentially expressed genes can then be detected from variations in the counts of their cognate sequence tags. Here we present the first systematic study on the influence of random fluctuations and sampling size on the reliability of this kind of data. We establish a rigorous significance test and demonstrate its use on publicly available transcript profiles. The theory links the threshold of selection of putatively regulated genes (e.g., the number of pharmaceutical leads) to the fraction of false positive clones one is willing to risk. Our results delineate more precisely and extend the limits within which digital Northern data can be used.

[1]  William E. Ricker,et al.  The Concept of Confidence or Fiducial Limits Applied to the Poisson Frequency Distribution , 1937 .

[2]  K. D. Tocher Extension of the Neyman-Pearson theory of tests to discontinuous variates. , 1950, Biometrika.

[3]  A. Agresti An introduction to categorical data analysis , 1997 .

[4]  S. P. Fodor,et al.  Light-directed, spatially addressable parallel chemical synthesis. , 1991, Science.

[5]  J. Sikela,et al.  Use of 3' untranslated sequences of human cDNAs for rapid chromosome assignment and conversion to STSs: implications for an expression map of the genome. , 1991, Nucleic acids research.

[6]  G. Lennon,et al.  Hybridization analyses of arrayed cDNA libraries. , 1991, Trends in genetics : TIG.

[7]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[8]  James M. Sikela,et al.  Single pass sequencing and physical and genetic mapping of human brain cDNAs , 1992, Nature Genetics.

[9]  Kousaku Okubo,et al.  Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression , 1992, Nature Genetics.

[10]  E. Southern,et al.  Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides: evaluation using experimental models. , 1992, Genomics.

[11]  J. Craig Venter,et al.  Sequence identification of 2,375 human brain genes , 1992, Nature.

[12]  K. Okubo,et al.  Identification of new genes by systematic analysis of cDNAs and database construction. , 1993, Current opinion in biotechnology.

[13]  A. Thiel,et al.  Direct fluorescence analysis of genetic polymorphisms by hybridization with oligonucleotide arrays on glass supports. , 1994, Nucleic acids research.

[14]  Peter J. Coassin,et al.  Biopolymer synthesis on polypropylene supports. I. Oligonucleotides. , 1994, Analytical biochemistry.

[15]  S. Granjeaud,et al.  Differential gene expression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones. , 1995, Genomics.

[16]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[17]  Y Sakaki,et al.  High-density cDNA filter analysis: a novel approach for large-scale, quantitative analysis of gene expression. , 1995, Gene.

[18]  Peter J. Coassin,et al.  Biopolymer synthesis on polypropylene supports: oligonucleotide arrays. , 1995, Analytical biochemistry.

[19]  R. Nowak Entering the Postgenome Era , 1995, Science.

[20]  K Matsubara,et al.  Monitoring cell physiology by expression profiles and discovering cell type-specific genes by compiled expression profiles. , 1995, Genomics.

[21]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[22]  M. Adams,et al.  Comparative expressed-sequence-tag analysis of differential gene expression profiles in PC-12 cells before and after nerve growth factor treatment. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Gregory D. Schuler,et al.  ESTablishing a human transcript map , 1995, Nature Genetics.

[24]  B Kuska,et al.  Cancer genome anatomy project set for take-off. , 1996, Journal of the National Cancer Institute.

[25]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[26]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[27]  Capitalizing on the genome , 1996, Nature Genetics.

[28]  Virtually sequenced: The next genomic generation , 1996, Nature Biotechnology.

[29]  C O'Brien Cancer genome anatomy project launched. , 1997, Molecular medicine today.