A quantitative evaluation of SAGE.

Serial Analysis of Gene Expression (SAGE) is an innovative technique that offers the potential of cataloging both the identity and relative frequencies of mRNA transcripts in a given poly(A(+)) RNA preparation. Although it is a very effective approach for determining the expression of mRNA populations, there are significant biases in the observed results that are inherent in the experimental process. These are caused by sampling error, sequencing error, nonuniqueness, and nonrandomness of tag sequences. The quantitative information desired from SAGE experiments consists of estimates of the number of genes and the frequency distribution of transcript copy numbers. Of additional concern is the extent to which a given tag sequence can be assumed to be unique to its gene. The present study takes these mathematical biases into account and presents a basis for maximum likelihood estimation of gene number and transcript copy frequencies given a set of experimental results. These estimates of the true state of genomic expression are markedly different from those based directly on the observations from the underlying experiments. It also is shown that while in many cases it is probable that a given tag sequence is unique within the genome, in larger genomes this cannot be safely assumed.

[1]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[2]  William H. Press,et al.  Numerical recipes in C , 2002 .

[3]  G. Landes,et al.  Analysis of human transcriptomes , 1999, Nature Genetics.

[4]  S. Altschul,et al.  Characterization of Gene Expression in Resting and Activated Mast Cells , 1998, The Journal of experimental medicine.

[5]  M Schena,et al.  Microarrays: biotechnology's discovery platform for functional genomics. , 1998, Trends in biotechnology.

[6]  S. Madden,et al.  SAGE transcript profiles for p53-dependent growth regulation , 1997, Oncogene.

[7]  S. Fields The future is function , 1997, Nature Genetics.

[8]  R. W. Davis,et al.  Discovery and analysis of inflammatory disease-related genes using cDNA microarrays. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Wei Zhou,et al.  Characterization of the Yeast Transcriptome , 1997, Cell.

[10]  R H Hruban,et al.  Gene expression profiles in normal and cancer cells. , 1997, Science.

[11]  S. P. Fodor,et al.  Detection of heterozygous mutations in BRCA1 using high density oligonucleotide arrays and two–colour fluorescence analysis , 1996, Nature Genetics.

[12]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[14]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[15]  J. Voorhees,et al.  Molecular basis of sun-induced premature skin ageing and retinoid antagonism , 1996, Nature.

[16]  M. Gibson,et al.  Elastic tissue, elastin and elastin associated microfibrils , 1996 .

[17]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[18]  H. Dietz,et al.  Mutations in the human gene for fibrillin-1 (FBN1) in the Marfan syndrome and related disorders. , 1995, Human molecular genetics.

[19]  H. Saedler,et al.  Restriction fragment length polymorphism-coupled domain-directed differential display: a highly efficient technique for expression analysis of multigene families. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[20]  K. Kadler Learning how mutations in type I collagen genes cause connective tissue disease. , 1993, International journal of experimental pathology.

[21]  A. Pardee,et al.  Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. , 1992, Science.

[22]  M. Bulmer,et al.  A statistical analysis of nucleotide sequences of introns and exons in human genes. , 1987, Molecular biology and evolution.

[23]  M. Bulmer,et al.  Neighboring base effects on substitution rates in pseudogenes. , 1986, Molecular biology and evolution.

[24]  G. W. Beeler,et al.  The concept of mRNA abundance classes: a critical reevaluation. , 1978, Nucleic acids research.

[25]  R. L. Winkler,et al.  Statistics : Probability, Inference and Decision , 1975 .

[26]  Thomas H. Wonnacott,et al.  Introductory Statistics , 2007, Technometrics.