Gene set enrichment analysis made simple

Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, choose an appropriate cut-off, and create a list of candidate genes. This approach has been criticised for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, the most popular method, gene set enrichment analysis (GSEA), seems overly complicated. Furthermore, GSEA is based on a statistical test known for its lack of sensitivity. In this article we compare the performance of a simple alternative to GSEA. We find that this simple solution clearly outperforms GSEA. We demonstrate this with eight different microarray datasets.

[1]  R A Irizarry,et al.  On the utility of pooling biological samples in microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Weida Tong,et al.  Bioinformatics approaches for cross-species liver cancer analysis based on microarray gene expression profiling , 2005, BMC Bioinformatics.

[3]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[4]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jelle J. Goeman,et al.  Testing association of a pathway with survival using gene expression data , 2005, Bioinform..

[6]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[7]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[8]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[9]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[10]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  John D. Storey A direct approach to false discovery rates , 2002 .

[13]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[14]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[15]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[17]  Paul Pavlidis,et al.  ErmineJ: Tool for functional analysis of gene expression data sets , 2005, BMC Bioinformatics.

[18]  William Stafford Noble,et al.  Exploring Gene Expression Data with Class Scores , 2001, Pacific Symposium on Biocomputing.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.