Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods

Gene set analysis (GSA) is used to elucidate genome-wide data, in particular transcriptome data. A multitude of methods have been proposed for this step of the analysis, and many of them have been compared and evaluated. Unfortunately, there is no consolidated opinion regarding what methods should be preferred, and the variety of available GSA software and implementations pose a difficulty for the end-user who wants to try out different methods. To address this, we have developed the R package Piano that collects a range of GSA methods into the same system, for the benefit of the end-user. Further on we refine the GSA workflow by using modifications of the gene-level statistics. This enables us to divide the resulting gene set P-values into three classes, describing different aspects of gene expression directionality at gene set level. We use our fully implemented workflow to investigate the impact of the individual components of GSA by using microarray and RNA-seq data. The results show that the evaluated methods are globally similar and the major separation correlates well with our defined directionality classes. As a consequence of this, we suggest to use a consensus scoring approach, based on multiple GSA runs. In combination with the directionality classes, this constitutes a more thorough basis for an enriched biological interpretation.

[1]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[2]  Peter J. Park,et al.  A multivariate approach for integrating genome-wide expression data and biological knowledge , 2006, Bioinform..

[3]  Michael A. Black,et al.  Microarray-based gene set analysis: a comparison of current methods , 2008, BMC Bioinformatics.

[4]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[5]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[6]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[7]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[8]  Zhiping Weng,et al.  Gene set enrichment analysis: performance evaluation and usage guidelines , 2012, Briefings Bioinform..

[9]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[10]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[11]  Qi Zheng,et al.  GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis , 2008, Nucleic Acids Res..

[12]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[13]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Nielsen,et al.  Uncovering transcriptional regulation of metabolism by using metabolic network topology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[16]  James J. Chen,et al.  Multivariate analysis of variance test for gene set analysis , 2009, Bioinform..

[17]  Jens Nielsen,et al.  Architecture of transcriptional regulatory circuits is knitted over the topology of bio-molecular interaction networks , 2008, BMC Systems Biology.

[18]  M. McCarthy,et al.  Interrogating Type 2 Diabetes Genome-Wide Association Data Using a Biological Pathway-Based Approach , 2009, Diabetes.

[19]  Intawat Nookaew,et al.  BioMet Toolbox: genome-wide analysis of metabolism , 2010, Nucleic Acids Res..

[20]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[21]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[22]  U. Mansmann,et al.  Testing Differential Gene Expression in Functional Groups , 2005, Methods of Information in Medicine.

[23]  Jiankai Xu,et al.  DBGSA: a novel method of distance-based gene set analysis , 2012, Journal of Human Genetics.

[24]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[25]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[26]  R. Tibshirani,et al.  A tail strength measure for assessing the overall univariate significance in a dataset. , 2005, Biostatistics.

[27]  Ralf Zimmer,et al.  Rigorous assessment of gene set enrichment tests , 2012, Bioinform..

[28]  Wenjun Cao,et al.  Statistical and Biological Evaluation of Different Gene Set Analysis Methods , 2011 .

[29]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[30]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[31]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[33]  B. Fridley,et al.  Self-Contained Gene-Set Analysis of Expression Data: An Evaluation of Existing and Novel Methods , 2010, PloS one.

[34]  Ulrich Mansmann,et al.  GlobalANCOVA: exploration and assessment of gene group effects , 2008, Bioinform..

[35]  Natapol Pornputtapong,et al.  Reconstruction of Genome-Scale Active Metabolic Networks for 69 Human Cell Types and 16 Cancer Types Using INIT , 2012, PLoS Comput. Biol..

[36]  I. Nookaew,et al.  A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae , 2012, Nucleic acids research.

[37]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[38]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[39]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[40]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.