Meta-Analysis of Pathway Enrichment: Combining Independent and Dependent Omics Data Sets

A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.

[1]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[2]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[3]  Xiao-Hua Zhou,et al.  Statistical Methods for Meta‐Analysis , 2008 .

[4]  W. R. Rice A Consensus Combined P-Value Test and the Family-wide Significance of Component Tests , 1990 .

[5]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[6]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[7]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[8]  O. Fiehn,et al.  Metabolite profiling for plant functional genomics , 2000, Nature Biotechnology.

[9]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[10]  Thomas M. Loughin,et al.  A systematic comparison of methods for combining p , 2004, Comput. Stat. Data Anal..

[11]  I. Feussner,et al.  Formation of oxylipins by CYP74 enzymes , 2006, Phytochemistry Reviews.

[12]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[13]  M. Whitlock Combining probability from independent tests: the weighted Z‐method is superior to Fisher's approach , 2005, Journal of evolutionary biology.

[14]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[15]  Beat Keller,et al.  The Arabidopsis male-sterile mutant dde2-2 is defective in the ALLENE OXIDE SYNTHASE gene encoding one of the key enzymes of the jasmonic acid biosynthesis pathway , 2002, Planta.

[16]  Peter Meinicke,et al.  MarVis-Filter: Ranking, Filtering, Adduct and Isotope Correction of Mass Spectrometry Data , 2012, Journal of biomedicine & biotechnology.

[17]  Wanchang Lin,et al.  Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules' , 2009, BMC Bioinformatics.

[18]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[19]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[20]  M. Schuler,et al.  Variations in CYP74B2 (Hydroperoxide Lyase) Gene Expression Differentially Affect Hexenal Signaling in the Columbia and Landsberg erecta Ecotypes of Arabidopsis1[w] , 2005, Plant Physiology.

[21]  M. Pagni,et al.  A Downstream Mediator in the Growth Repression Limb of the Jasmonate Pathway[W][OA] , 2007, The Plant Cell Online.

[22]  David S. Wishart,et al.  MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data , 2010, Nucleic Acids Res..

[23]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Michael Witting,et al.  MassTRIX Reloaded: Combined Analysis and Visualization of Transcriptome and Metabolome Data , 2012, PloS one.

[25]  O. Fiehn,et al.  Process for the integrated extraction, identification and quantification of metabolites, proteins and RNA to reveal their co‐regulation in biochemical networks , 2004, Proteomics.

[26]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[27]  I. Sønderby,et al.  Biosynthesis of glucosinolates--gene discovery and beyond. , 2010, Trends in plant science.

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[29]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[30]  Morton B. Brown 400: A Method for Combining Non-Independent, One-Sided Tests of Significance , 1975 .

[31]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[32]  C. Wasternack,et al.  Jasmonates: biosynthesis, perception, signal transduction and action in plant stress response, growth and development. An update to the 2007 review in Annals of Botany. , 2013, Annals of botany.

[33]  George C. Tseng,et al.  Meta-analysis for pathway enrichment analysis when combining multiple genomic studies , 2010, Bioinform..

[34]  Yves Moreau,et al.  CATMA, a comprehensive genome-scale resource for silencing and transcript profiling of Arabidopsis genes , 2007, BMC Bioinformatics.

[35]  Matthew A. Hibbs,et al.  Visualization of omics data for systems biology , 2010, Nature Methods.

[36]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[37]  Ralf J. M. Weber,et al.  Mass appeal: metabolite identification in mass spectrometry-focused untargeted metabolomics , 2012, Metabolomics.

[38]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[39]  Burkhard Morgenstern,et al.  Metabolite-based clustering and visualization of mass spectrometry data using one-dimensional self-organizing maps , 2008, Algorithms for Molecular Biology.

[40]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[41]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..