Statistical evaluation of SAGE libraries: consequences for experimental design.

Since the introduction of serial analysis of gene expression (SAGE) as a method to quantitatively analyze the differential expression of genes, several statistical tests have been published for the pairwise comparison of SAGE libraries. Testing the difference between the number of specific tags found in two SAGE libraries is hampered by the fact that each SAGE library is only one measurement: the necessary information on biological variation or experimental precision is not available. In the currently available tests, a measure of this variance is obtained from simulation or based on the properties of the tag distribution. To help the user of SAGE to decide between these tests, five different pairwise tests have been compared by determining the critical values, that is, the lowest number of tags that, given an observed number of tags in one library, needs to be found in the other library to result in a significant P value. The five tests included in this comparison are SAGE300, the tests described by Madden et al. (Oncogene 15: 1079-1085, 1997) and by Audic and Claverie (Genome Res 7: 986-995, 1997), Fisher's Exact test, and the Z test, which is equivalent to the chi-squared test. The comparison showed that, for SAGE libraries of equal as well as different size, SAGE300, Fisher's Exact test, Z test, and the Audic and Claverie test have critical values within 1.5% of each other. This indicates that these four tests will give essentially the same results when applied to SAGE libraries. The Madden test, which can only be used for libraries of similar size, is, with 25% higher critical values, more conservative, probably because the variance measure in its test statistic is not appropriate for hypothesis testing. The consequences for the choice of SAGE library sizes are discussed.

[1]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[2]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[3]  A. Agresti [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .

[4]  S. Madden,et al.  SAGE transcript profiles for p53-dependent growth regulation , 1997, Oncogene.

[5]  R H Hruban,et al.  Gene expression profiles in normal and cancer cells. , 1997, Science.

[6]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.

[7]  Stephen F. Altschul,et al.  Characterization of Gene Expression in Resting and Activated Mast Cells , 1998, The Journal of experimental medicine.

[8]  J. Claverie Computational methods for the identification of differential and coordinated gene expression. , 1999, Human molecular genetics.

[9]  S. Altschul,et al.  A public database for gene expression in human cancers. , 1999, Cancer research.

[10]  Martin Vingron,et al.  Computational aspects of expression data , 1999, Journal of Molecular Medicine.

[11]  P M Bossuyt,et al.  Genes differentially expressed in medulloblastoma and fetal brain. , 1999, Physiological genomics.

[12]  M. G. Koerkamp,et al.  Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources. , 1999, Molecular biology of the cell.

[13]  S. Altschul,et al.  SAGEmap: a public gene expression resource. , 2000, Genome research.

[14]  K. Mühlemann,et al.  Transcriptome analysis of fibroblast cells immediate-early after human cytomegalovirus infection. , 2000, Journal of molecular biology.

[15]  J. Stollberg,et al.  A quantitative evaluation of SAGE. , 2000, Genome research.

[16]  Yixin Wang,et al.  POWER_SAGE: comparing statistical tests for SAGE experiments , 2000, Bioinform..

[17]  Elliott H. Margulies,et al.  eSAGE: managing and analysing data generated with Serial Analysis of Gene Expression (SAGE) , 2000, Bioinform..

[18]  Antoine H. C. van Kampen,et al.  USAGE: a web-based approach towards the analysis of SAGE data , 2000, Bioinform..

[19]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.