Enrichment Analysis of Metabolic Pathways Using P-value Perturbation

Motivation: Assessing the enrichment of biological pathways is a critical problem in systems biology, which has received much attention in recent years. However, available methods often make simplifying assumptions about correlations among components of pathways, and/or rely on appropriate choices of input parameters and sizes of gene sets. In addition, current methods often focus on analysis of gene expression data, and are not suitable for analysis of other sources of omics data, in particular data from metabolomic and proteomic studies. Results: We propose a new, easy to implement methodology for assessing the enrichment of biological pathways, which generalizes the applicability of enrichment analysis to other sources of omics data. The proposed method is based on a perturbed version of regular p-values, which result in simultaneous reduction in false positive and false negative errors, under complex correlation structures. Two choices of enrichment scores are proposed, and extensive numerical studies in simulated and real data examples from a metabolomic profiling study in Bladder cancer indicate that the new method offers significant improvements over state-of-the-art methods of enrichment analysis. Availability and Implementation: The proposed methodology is

[1]  Yair Lotan,et al.  Metabolomic profiling reveals potential markers and bioprocesses altered in bladder cancer progression. , 2011, Cancer research.

[2]  J. Mesirov,et al.  Gene Set Enrichment Analysis Made Right , 2011 .

[3]  Matej Oresic,et al.  MPEA - metabolite pathway enrichment analysis , 2011, Bioinform..

[4]  Stefano Monti,et al.  Signatures of murine B-cell development implicate Yy1 as a regulator of the germinal center-specific program , 2011, Proceedings of the National Academy of Sciences.

[5]  M. Imieliński,et al.  In Situ Proteomic Analysis of Human Breast Cancer Epithelial Cells Using Laser Capture Microdissection: Annotation by Protein Set Enrichment Analysis and Gene Ontology* , 2010, Molecular & Cellular Proteomics.

[6]  Scott L Pomeroy,et al.  Epigenetic antagonism between polycomb and SWI/SNF complexes during oncogenic transformation. , 2010, Cancer cell.

[7]  J. Rinn,et al.  A Large Intergenic Noncoding RNA Induced by p53 Mediates Global Gene Repression in the p53 Response , 2010, Cell.

[8]  David S. Wishart,et al.  MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data , 2010, Nucleic Acids Res..

[9]  Gary D Bader,et al.  Pathway analysis of dilated cardiomyopathy using global proteomic profiling and enrichment maps , 2010, Proteomics.

[10]  W. Liu,et al.  Proteomics, pathway array and signaling network-based medicine in cancer , 2009, Cell Division.

[11]  Benjamin P. Bowen,et al.  Proteomics Analysis of Human Skeletal Muscle Reveals Novel Abnormalities in Obesity and Type 2 Diabetes , 2009, Diabetes.

[12]  Ali Shojaie,et al.  Analysis of Gene Sets Based on the Underlying Regulatory Network , 2009, J. Comput. Biol..

[13]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[14]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[15]  R. Gerszten,et al.  Application of metabolomics to cardiovascular biomarker and pathway discovery. , 2008, Journal of the American College of Cardiology.

[16]  Bryan R. Cullen,et al.  A viral microRNA functions as an orthologue of cellular miR-155 , 2007, Nature.

[17]  Tao Chen,et al.  Significance analysis of groups of genes in expression profiling studies , 2007, Bioinform..

[18]  Ilya Shmulevich,et al.  ProbCD: enrichment analysis accounting for categorization uncertainty , 2007, BMC Bioinformatics.

[19]  Qi Liu,et al.  Improving gene set analysis of microarray data by SAM-GS , 2007, BMC Bioinformatics.

[20]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[21]  Adam A. Margolin,et al.  NOTCH1 directly regulates c-MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth , 2006, Proceedings of the National Academy of Sciences.

[22]  P. Puigserver,et al.  Resveratrol improves health and survival of mice on a high-calorie diet , 2006, Nature.

[23]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[24]  Dimitri Krainc,et al.  Transcriptional Repression of PGC-1α by Mutant Huntingtin Leads to Mitochondrial Dysfunction and Neurodegeneration , 2006, Cell.

[25]  E. Lander,et al.  Reactive oxygen species have a causal role in multiple forms of insulin resistance , 2006, Nature.

[26]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[29]  Christine Brun,et al.  In silico prediction of protein-protein interactions in human macrophages , 2001, BMC Research Notes.

[30]  A. Shojaie High Dimensional Hypothesis Screening Using P-value Perturbation , 2013 .

[31]  G. Michailidis,et al.  Network Enrichment Analysis in Complex Experiments , 2010, Statistical applications in genetics and molecular biology.

[32]  Ethan Y Xu,et al.  Metabolomics in pharmaceutical research and development: metabolites, mechanisms and pathways. , 2009, Current opinion in drug discovery & development.

[33]  R. Gentleman,et al.  Gene expression Extensions to gene set enrichment , 2007 .

[34]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[35]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[36]  David S. Wishart,et al.  Bioinformatics Applications Note Systems Biology Metpa: a Web-based Metabolomics Tool for Pathway Analysis and Visualization , 2022 .