Integrative set enrichment testing for multiple omics platforms

BackgroundEnrichment testing assesses the overall evidence of differential expression behavior of the elements within a defined set. When we have measured many molecular aspects, e.g. gene expression, metabolites, proteins, it is desirable to assess their differential tendencies jointly across platforms using an integrated set enrichment test. In this work we explore the properties of several methods for performing a combined enrichment test using gene expression and metabolomics as the motivating platforms.ResultsUsing two simulation models we explored the properties of several enrichment methods including two novel methods: the logistic regression 2-degree of freedom Wald test and the 2-dimensional permutation p-value for the sum-of-squared statistics test. In relation to their univariate counterparts we find that the joint tests can improve our ability to detect results that are marginal univariately. We also find that joint tests improve the ranking of associated pathways compared to their univariate counterparts. However, there is a risk of Type I error inflation with some methods and self-contained methods lose specificity when the sets are not representative of underlying association.ConclusionsIn this work we show that consideration of data from multiple platforms, in conjunction with summarization via a priori pathway information, leads to increased power in detection of genomic associations with phenotypes.

[1]  M. Zanor,et al.  Integrated Analysis of Metabolite and Transcript Levels Reveals the Metabolic Shifts That Underlie Tomato Fruit Development and Highlight Regulatory Aspects of Metabolic Network Behavior1[W] , 2006, Plant Physiology.

[2]  Scott L. Zeger,et al.  Bootstrapping generalized linear models , 1991 .

[3]  J. Kost,et al.  Combining dependent P-values , 2002 .

[4]  John T. Wei,et al.  Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression , 2009, Nature.

[5]  H. Kipen,et al.  Questions and Answers 1 , 1994 .

[6]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[7]  Wolfram Weckwerth,et al.  Integration of metabolomics and proteomics in molecular plant physiology--coping with the complexity by data-dimensionality reduction. , 2008, Physiologia plantarum.

[8]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[9]  Mario Medvedovic,et al.  LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data , 2009, Bioinform..

[10]  G. Parmigiani,et al.  Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses , 2008, Science.

[11]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[12]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[13]  Søren Brunak,et al.  Integration of clinical chemistry, expression, and metabolite data leads to better toxicological class separation. , 2008, Toxicological sciences : an official journal of the Society of Toxicology.

[14]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[15]  BMC Bioinformatics , 2005 .

[16]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[17]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[18]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[19]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..