Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes

BackgroundBiological pathways are important for understanding biological mechanisms. Thus, finding important pathways that underlie biological problems helps researchers to focus on the most relevant sets of genes. Pathways resemble networks with complicated structures, but most of the existing pathway enrichment tools ignore topological information embedded within pathways, which limits their applicability.ResultsA systematic and extensible pathway enrichment method in which nodes are weighted by network centrality was proposed. We demonstrate how choice of pathway structure and centrality measurement, as well as the presence of key genes, affects pathway significance. We emphasize two improvements of our method over current methods. First, allowing for the diversity of genes’ characters and the difficulty of covering gene importance from all aspects, we set centrality as an optional parameter in the model. Second, nodes rather than genes form the basic unit of pathways, such that one node can be composed of several genes and one gene may reside in different nodes. By comparing our methodology to the original enrichment method using both simulation data and real-world data, we demonstrate the efficacy of our method in finding new pathways from biological perspective.ConclusionsOur method can benefit the systematic analysis of biological pathways and help to extract more meaningful information from gene expression data. The algorithm has been implemented as an R package CePa, and also a web-based version of CePa is provided.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[3]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[4]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[5]  Ralf Zimmer,et al.  Rigorous assessment of gene set enrichment tests , 2012, Bioinform..

[6]  Ying Xia,et al.  c-Jun, at the crossroad of the signaling network , 2011, Protein & Cell.

[7]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[8]  Di Wu,et al.  ROAST: rotation gene set tests for complex microarray experiments , 2010, Bioinform..

[9]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[10]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[11]  D. Fell,et al.  The small world of metabolism , 2000, Nature Biotechnology.

[12]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[13]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[14]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[15]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[16]  Chris Sander,et al.  Pathway information for systems biology , 2005, FEBS letters.

[17]  Angela M. Liu,et al.  microRNA-122 as a regulator of mitochondrial metabolic gene network in hepatocellular carcinoma , 2010, Molecular systems biology.

[18]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[19]  Chao Cheng,et al.  Inferring MicroRNA Activities by Combining Gene Expression with MicroRNA Target Prediction , 2008, PloS one.

[20]  Giovanni Scardoni,et al.  Analyzing biological network parameters with CentiScaPe , 2009, Bioinform..

[21]  F. Schreiber,et al.  Centrality Analysis Methods for Biological Networks and Their Application to Gene Regulatory Networks , 2008, Gene regulation and systems biology.

[22]  Hans van Dam,et al.  Distinct roles of Jun : Fos and Jun : ATF dimers in oncogenesis , 2001, Oncogene.

[23]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[24]  Falk Schreiber,et al.  Exploration of biological network centralities with CentiBiN , 2006, BMC Bioinformatics.

[25]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[26]  C. Daub,et al.  BMC Systems Biology , 2007 .

[27]  S. Kasif,et al.  Network-Based Analysis of Affected Biological Processes in Type 2 Diabetes Models , 2007, PLoS genetics.

[28]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[29]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[30]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[31]  Michael A. Black,et al.  Microarray-based gene set analysis: a comparison of current methods , 2008, BMC Bioinformatics.

[32]  Michael J. Lush,et al.  genenames.org: the HGNC resources in 2011 , 2010, Nucleic Acids Res..

[33]  Nicola J. Mulder,et al.  From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems , 2011, Bioinform..

[34]  Ulrich Mansmann,et al.  GlobalANCOVA: exploration and assessment of gene group effects , 2008, Bioinform..

[35]  Ping Liu,et al.  Activation of NF-kappaB, AP-1 and STAT transcription factors is a frequent and early event in human hepatocellular carcinomas , 2002 .

[36]  Michael A Newton,et al.  A Model-Based Analysis to Infer the Functional Content of a Gene List , 2012, Statistical applications in genetics and molecular biology.

[37]  Insuk Sohn,et al.  Multiple testing for gene sets from microarray experiments , 2011, BMC Bioinformatics.

[38]  Xujing Wang,et al.  TAPPA: topological analysis of pathway phenotype association , 2007, Bioinform..

[39]  M. Cobb,et al.  Mitogen-activated protein (MAP) kinase pathways: regulation and physiological functions. , 2001, Endocrine reviews.

[40]  Lena Claesson-Welsh,et al.  VEGF Receptor Signal Transduction , 2001, Science's STKE.

[41]  Gary D. Bader,et al.  Cytoscape Web: an interactive web-based network browser , 2010, Bioinform..

[42]  David J. States,et al.  Bioinformatics Applications Note Databases and Ontologies Metab2mesh: Annotating Compounds with Medical Subject Headings , 2022 .

[43]  Geffrey F. Stopper,et al.  Choosing the right path: enhancement of biologically relevant sets of genes or proteins using pathway structure , 2009, Genome Biology.

[44]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[46]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[47]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[48]  Martin M Matzuk,et al.  A bioinformatics tool for linking gene expression profiling results with public databases of microRNA target predictions. , 2008, RNA.

[49]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[50]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[51]  Christina Kendziorski,et al.  Statistical methods for gene set co-expression analysis , 2009, Bioinform..

[52]  Paul Erdös,et al.  On random graphs, I , 1959 .

[53]  A. Nobel,et al.  Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets , 2010, BMC Genomics.

[54]  Matthias Evert,et al.  Oncogenic and tumor suppressive roles of polo‐like kinases in human hepatocellular carcinoma , 2010, Hepatology.

[55]  A. Zhu,et al.  The role of signaling pathways in the development and treatment of hepatocellular carcinoma , 2010, Oncogene.

[56]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Zhiping Weng,et al.  Identification of functional modules that correlate with phenotypic difference: the influence of network topology , 2010, Genome Biology.

[58]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[59]  A Vacca,et al.  Erythropoietin/erythropoietin-receptor system is involved in angiogenesis in human hepatocellular carcinoma , 2007, Histopathology.