BMC Bioinformatics BioMed Central Methodology article Comparative evaluation of gene-set analysis methods

BackgroundMultiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Test, ANCOVA Global Test, and SAM-GS, that test "self-contained null hypotheses" Via. subject sampling. The three methods were compared based on a simulation experiment and analyses of three real-world microarray datasets.ResultsIn the simulation experiment, we found that the use of the asymptotic distribution in the two Global Tests leads to a statistical test with an incorrect size. Specifically, p-values calculated by the scaled χ2 distribution of Global Test and the asymptotic distribution of ANCOVA Global Test are too liberal, while the asymptotic distribution with a quadratic form of the Global Test results in p-values that are too conservative. The two Global Tests with permutation-based inference, however, gave a correct size. While the three methods showed similar power using permutation inference after a proper standardization of gene expression data, SAM-GS showed slightly higher power than the Global Tests. In the analysis of a real-world microarray dataset, the two Global Tests gave markedly different results, compared to SAM-GS, in identifying pathways whose gene expressions are associated with p53 mutation in cancer cell lines. A proper standardization of gene expression variances is necessary for the two Global Tests in order to produce biologically sensible results. After the standardization, the three methods gave very similar biologically-sensible results, with slightly higher statistical significance given by SAM-GS. The three methods gave similar patterns of results in the analysis of the other two microarray datasets.ConclusionAn appropriate standardization makes the performance of all three methods similar, given the use of permutation-based inference. SAM-GS tends to have slightly higher power in the lower α-level region (i.e. gene sets that are of the greatest interest). Global Test and ANCOVA Global Test have the important advantage of being able to analyze continuous and survival phenotypes and to adjust for covariates. A free Microsoft Excel Add-In to perform SAM-GS is available from http://www.ualberta.ca/~yyasui/homepage.html.

[1]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[2]  A. Poustka,et al.  Parameter estimation for the calibration and variance stabilization of microarray data , 2003, Statistical applications in genetics and molecular biology.

[3]  H C van Houwelingen,et al.  Testing the fit of a regression model via score tests in random effects models. , 1995, Biometrics.

[4]  Charles A Tilford,et al.  Gene set enrichment analysis. , 2009, Methods in molecular biology.

[5]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[6]  Qi Liu,et al.  Improving gene set analysis of microarray data by SAM-GS , 2007, BMC Bioinformatics.

[7]  F R Rosendaal,et al.  Testing familial aggregation. , 1995, Biometrics.

[8]  U. Mansmann,et al.  Testing Differential Gene Expression in Functional Groups , 2005, Methods of Information in Medicine.

[9]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[10]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  John D. Potter,et al.  Improving GSEA for Analysis of Biologic Pathways for Differential Gene Expression across a Binary Phenotype , 2007 .

[12]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  John D. Storey A direct approach to false discovery rates , 2002 .

[14]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[16]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[17]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[18]  Jelle J. Goeman,et al.  Testing association of a pathway with survival using gene expression data , 2005, Bioinform..

[19]  Sara van de Geer,et al.  Testing against a high dimensional alternative , 2006 .

[20]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.