LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

[1]  Cristina Mitrea,et al.  Methods and approaches in the topology-based analysis of biological pathways , 2013, Front. Physiol..

[2]  M. Newton,et al.  Drosophila RNAi screen identifies host genes important for influenza virus replication , 2008, Nature.

[3]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[4]  Marcel J. T. Reinders,et al.  Fewer permutations, more accurate P-values , 2009, Bioinform..

[5]  Sorin Draghici,et al.  Incorporating Gene Significance in the Impact Analysis of Signaling Pathways , 2012, 2012 11th International Conference on Machine Learning and Applications.

[6]  Purvesh Khatri,et al.  Recent additions and improvements to the Onto-Tools , 2005, Nucleic Acids Res..

[7]  Andrey Alexeyenko,et al.  Network enrichment analysis: extension of gene-set enrichment analysis to gene networks , 2012, BMC Bioinformatics.

[8]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[9]  A. Levine,et al.  The p53 pathway: positive and negative feedback loops , 2005, Oncogene.

[10]  Matthew E Ritchie,et al.  Integrative analysis of RUNX1 downstream pathways and target genes , 2008, BMC Genomics.

[11]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[12]  David J. Adams,et al.  The IFITM Proteins Mediate Cellular Resistance to Influenza A H1N1 Virus, West Nile Virus, and Dengue Virus , 2009, Cell.

[13]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[14]  Mark Craven,et al.  Limited Agreement of Independent RNAi Screens for Virus-Required Host Genes Owes More to False-Negative than False-Positive Factors , 2013, PLoS Comput. Biol..

[15]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[16]  Tim R. Mercer,et al.  Global analysis of the mammalian RNA degradome reveals widespread miRNA-dependent and miRNA-independent endonucleolytic cleavage , 2011, Nucleic acids research.

[17]  Qi Zheng,et al.  GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis , 2008, Nucleic Acids Res..

[18]  Amato J Giaccia,et al.  The role of p53 in hypoxia-induced apoptosis. , 2005, Biochemical and biophysical research communications.

[19]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[20]  M. Barbacid,et al.  To cycle or not to cycle: a critical decision in cancer , 2001, Nature reviews. Cancer.

[21]  Sorin Draghici,et al.  Down-weighting overlapping genes improves gene set analysis , 2012, BMC Bioinformatics.

[22]  Alfonso Valencia,et al.  EnrichNet: network-based gene set enrichment analysis , 2012, Bioinform..

[23]  Gary D. Bader,et al.  GeneMANIA Prediction Server 2013 Update , 2013, Nucleic Acids Res..

[24]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[25]  M. Gerstein,et al.  Genomic analysis of regulatory network dynamics reveals large topological changes , 2004, Nature.

[26]  A. Mitchell,et al.  Identification of functionally related genes that stimulate early meiotic gene expression in yeast. , 1993, Genetics.

[27]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[28]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[29]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[30]  Christopher S. Poultney,et al.  Synaptic, transcriptional, and chromatin genes disrupted in autism , 2014, Nature.

[31]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[32]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[33]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[34]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[35]  Stephen J. Elledge,et al.  Supplemental Data The IFITM Proteins Mediate Cellular Resistance to Influenza A H 1 N 1 Virus , West Nile Virus , and Dengue Virus , 2009 .

[36]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[37]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[38]  R. König,et al.  Human Host Factors Required for Influenza Virus Replication , 2010, Nature.

[39]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[40]  S. Narum,et al.  Beyond Bonferroni: Less conservative analyses for conservation genetics , 2005, Conservation Genetics.

[41]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[42]  H. Ji,et al.  A network-based gene-weighting approach for pathway analysis , 2011, Cell Research.

[43]  Weidong Tian,et al.  An iterative network partition algorithm for accurate identification of dense network modules , 2011, Nucleic acids research.

[44]  Hyojin Kim,et al.  YeastNet v3: a public database of data-specific and integrated functional gene networks for Saccharomyces cerevisiae , 2013, Nucleic Acids Res..

[45]  Igor V. Karpichev,et al.  Global Regulatory Functions of Oaf1p and Pip2p (Oaf2p), Transcription Factors That Regulate Genes Encoding Peroxisomal Proteins in Saccharomyces cerevisiae , 1998, Molecular and Cellular Biology.

[46]  Jin Wang,et al.  CePa: an R package for finding significant pathways weighted by multiple network centralities , 2013, Bioinform..

[47]  J. Hussman,et al.  Letters to the Editor: Suppressed GABAergic Inhibition as a Common Factor in Suspected Etiologies of Autism , 2001, Journal of autism and developmental disorders.

[48]  Boris Yamrom,et al.  The contribution of de novo coding mutations to autism spectrum disorder , 2014, Nature.

[49]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[50]  Ni Li,et al.  Gene Ontology Annotations and Resources , 2012, Nucleic Acids Res..

[51]  Jin Wang,et al.  Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes , 2012, BMC Systems Biology.

[52]  Andrey Alexeyenko,et al.  Comparative interactomics with Funcoup 2.0 , 2011, Nucleic Acids Res..

[53]  Xiang-Sun Zhang,et al.  NOA: a novel Network Ontology Analysis method , 2011, Nucleic acids research.

[54]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[55]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[56]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[57]  Purvesh Khatri,et al.  Onto-Tools: new additions and improvements in 2006 , 2007, Nucleic Acids Res..

[58]  Peter N. Robinson,et al.  GOing Bayesian: model-based gene set analysis of genome-scale data , 2010, Nucleic acids research.

[59]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[60]  Christopher Gillberg,et al.  Vitamin D and autism: clinical review. , 2012, Research in developmental disabilities.

[61]  Daniel Becker,et al.  Genome-wide RNAi screen identifies human host factors crucial for influenza virus replication , 2010, Nature.

[62]  V Latora,et al.  Efficient behavior of small-world networks. , 2001, Physical review letters.

[63]  A Helenius,et al.  Folding of influenza hemagglutinin in the endoplasmic reticulum , 1991, The Journal of cell biology.

[64]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.