NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis

BackgroundHigh-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes.ResultsIn this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub (http://github.com/wulingyun/CopTea/).ConclusionOur procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.

[1]  Alexa T. McCray,et al.  Markov Chain Ontology Analysis (MCOA) , 2012, BMC Bioinformatics.

[2]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[3]  D. Ankerst,et al.  Afamin and Apolipoprotein A-IV: Novel Protein Markers for Ovarian Cancer , 2009, Cancer Epidemiology Biomarkers & Prevention.

[4]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[5]  Y. Cho,et al.  Two-dimensional gel analysis of protein expression profile in squamous cervical cancer patients. , 2005, Gynecologic oncology.

[6]  Eduard Batlle,et al.  Role of tRNA modifications in human diseases. , 2014, Trends in molecular medicine.

[7]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[9]  Peter N. Robinson,et al.  GOing Bayesian: model-based gene set analysis of genome-scale data , 2010, Nucleic acids research.

[10]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[11]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).

[12]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[13]  Mala Sinha,et al.  Pathway Signature and Cellular Differentiation in Clear Cell Renal Cell Carcinoma , 2010, PloS one.

[14]  P. Khatri,et al.  Profiling gene expression using onto-express. , 2002, Genomics.

[15]  Harvey T. McMahon,et al.  Molecular mechanism and physiological functions of clathrin-mediated endocytosis , 2011, Nature Reviews Molecular Cell Biology.

[16]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[17]  Andrey Alexeyenko,et al.  Network enrichment analysis: extension of gene-set enrichment analysis to gene networks , 2012, BMC Bioinformatics.

[18]  Lingrong Liu,et al.  Thiolated chitosan-modified PLA-PCL-TPGS nanoparticles for oral chemotherapy of lung cancer , 2013, Nanoscale Research Letters.

[19]  E. Wiechec,et al.  Role of ion channels in regulating Ca2+ homeostasis during the interplay between immune and cancer cells , 2015, Cell Death and Disease.

[20]  Zhen Su,et al.  EasyGO: Gene Ontology-based annotation and functional enrichment analysis tool for agronomical species , 2007, BMC Genomics.

[21]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[22]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[23]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[24]  S. Kaja,et al.  Voltage-Gated Ion Channels in Cancer Cell Proliferation , 2015, Cancers.

[25]  Robert J. Gillies,et al.  pH sensing and regulation in cancer , 2013, Front. Physiol..

[26]  R. Fisher On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[27]  Alfonso Valencia,et al.  EnrichNet: network-based gene set enrichment analysis , 2012, Bioinform..

[28]  M. López-Lázaro,et al.  Dual role of hydrogen peroxide in cancer: possible relevance to cancer chemoprevention and therapy. , 2007, Cancer letters.

[29]  I. Simon,et al.  A probabilistic generative model for GO enrichment analysis , 2008, Nucleic acids research.

[30]  Jonathan W. Pillow,et al.  POSTER PRESENTATION Open Access , 2013 .

[31]  C. Nathan,et al.  Production of large amounts of hydrogen peroxide by human tumor cells. , 1991, Cancer research.

[32]  M. Karno,et al.  Renal cell carcinoma. , 1956, Bulletin. Tufts-New England Medical Center.

[33]  H. Maeda,et al.  Tumor vascular permeability and the EPR effect in macromolecular therapeutics: a review. , 2000, Journal of controlled release : official journal of the Controlled Release Society.

[34]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[35]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[36]  T. Sudhof,et al.  The synaptic vesicle cycle. , 2004, Annual review of neuroscience.

[37]  Xiang-Sun Zhang,et al.  NOA: a novel Network Ontology Analysis method , 2011, Nucleic acids research.

[38]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[39]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[40]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  B. Bao,et al.  Exosomes in cancer development, metastasis, and drug resistance: a comprehensive review , 2013, Cancer and Metastasis Reviews.