DOMINO: a novel network-based module detection algorithm with reduced rate of false calls

Network-based module discovery (NBMD) methods have taken a central role in integrative analyses of omics data in modern bioinformatics. NBMD algorithms receive a gene network and nodes9 activity scores as input and report sub-networks (modules) that are putatively biologically meaningful in the context of the activity data. Although NBMD methods exist for almost two decades, only a handful of studies attempted to compare the biological signals captured by different methods. Here, we first set to systematically evaluate six popular NBMD methods on gene expression (GE) data and Gene-Wide-Association Studies (GWAS). Notably, testing Gene Ontology (GO) enrichment of modules obtained by these methods, we observed that GO terms enriched on modules detected on the real data were often also enriched after randomly permuting the input data. To tackle this bias, we designed the EMpirical Pipeline (EMP), a method that infers the empirical significance of GO enrichment scores of an NBMD solution by computing, for each term, a background distribution of scores on permuted data. We used the EMP to fashion five novel performance evaluation criteria for NBMD methods. Last, we developed DOMINO (Discovery of Modules In Networks using Omics) - a novel NBMD algorithm. In extensive testing on gene expression and genome-wide association study data it outperformed the other six algorithms. As it produces solutions with only a few non-specific GO terms, DOMINO can be used without empirical validation. EMP and DOMINO are available at https://github.com/Shamir-Lab/.

[1]  Anthony J. Payne,et al.  Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps , 2018, Nature Genetics.

[2]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[3]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[4]  David S. Johnson,et al.  The prize collecting Steiner tree problem: theory and practice , 2000, SODA '00.

[5]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[6]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[7]  Yang I Li,et al.  An Expanded View of Complex Traits: From Polygenic to Omnigenic , 2017, Cell.

[8]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[9]  Tobias Friedrich,et al.  Efficient algorithms for extracting biological key pathways with global constraints , 2012, GECCO '12.

[10]  Tobias Müller,et al.  Bioinformatics Applications Note Systems Biology Bionet: an R-package for the Functional Analysis of Biological Networks , 2022 .

[11]  Jacqueline K. White,et al.  Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis , 2017, Nature Genetics.

[12]  Petter Holme,et al.  Network Properties of Complex Human Disease Genes Identified through Genome-Wide Association Studies , 2009, PloS one.

[13]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[14]  V. Miano,et al.  Luminal lncRNAs Regulation by ERα-Controlled Enhancers in a Ligand-Independent Manner in Breast Cancer Cells , 2018, International journal of molecular sciences.

[15]  Daniel Marbach,et al.  Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases , 2016, Nature Methods.

[16]  David C. Wilson,et al.  Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease , 2016, Nature Genetics.

[17]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[18]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[19]  Bonnie E. Shook-Sa,et al.  . CC-BY-NC-ND 4 . 0 International licenseIt is made available under a is the author / funder , who has granted medRxiv a license to display the preprint in perpetuity , 2021 .

[20]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[21]  Piotr Indyk,et al.  A Fast , Adaptive Variant of the Goemans-Williamson Scheme for the Prize-Collecting Steiner Tree Problem , 2015 .

[22]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[23]  R. Elkon,et al.  Cell Type–Specific Transcriptome Analysis Reveals a Major Role for Zeb1 and miR-200b in Mouse Inner Ear Morphogenesis , 2011, PLoS genetics.

[24]  K. Vousden,et al.  Stress Signals Utilize Multiple Pathways To Stabilize p53 , 2000, Molecular and Cellular Biology.

[25]  S. Mandrup,et al.  Acute TNF-induced repression of cell identity genes is mediated by NFκB-directed redistribution of cofactors from super-enhancers , 2015, Genome research.

[26]  Wei Zhang,et al.  Systematic Evaluation of Molecular Networks for Discovery of Disease Genes. , 2018, Cell systems.

[27]  C. Musselman,et al.  Engagement of DNA and H3K27me3 by the CBX8 chromodomain drives chromatin association , 2018, Nucleic acids research.

[28]  Syed Haider,et al.  International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data , 2011, Database J. Biol. Databases Curation.

[29]  Shane A. Evans,et al.  Regulation of Cellular Senescence by Polycomb Chromatin Modifiers through Distinct DNA Damage-and Histone Methylation-Dependent Pathways , 2018, Cell reports.

[30]  Sekar Kathiresan,et al.  Genetics of Common, Complex Coronary Artery Disease , 2019, Cell.

[31]  Maximilian Billmann,et al.  Integrating genetic and protein-protein interaction networks maps a functional wiring diagram of a cell. , 2018, Current opinion in microbiology.

[32]  J. Dekker,et al.  CBFβ-SMMHC Inhibition Triggers Apoptosis by Disrupting MYC Chromatin Dynamics in Acute Myeloid Leukemia , 2018, Cell.

[33]  Daniel Marbach,et al.  Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics , 2016, PLoS Comput. Biol..

[34]  Ming Chen,et al.  Integrative Bioinformatics: History and Future , 2019, J. Integr. Bioinform..

[35]  Yara T. E. Lechanteur,et al.  Nature Genetics Advance Online Publication , 2022 .

[36]  C. Sander,et al.  Automated Network Analysis Identifies Core Pathways in Glioblastoma , 2010, PloS one.

[37]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[38]  S. Tsang,et al.  The unfolded protein response regulator ATF6 promotes mesodermal differentiation , 2018, Science Signaling.

[39]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[40]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[41]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[42]  D. Geschwind,et al.  Defining the Genetic, Genomic, Cellular, and Diagnostic Architectures of Psychiatric Disorders , 2019, Cell.

[43]  A. Oshlack,et al.  Patient-iPSC-Derived Kidney Organoids Show Functional Validation of a Ciliopathic Renal Phenotype and Reveal Underlying Pathogenetic Mechanisms. , 2018, American journal of human genetics.

[44]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[45]  Tanya M. Teslovich,et al.  Biobank-driven genomic discovery yields new insight into atrial fibrillation biology , 2018, Nature Genetics.

[46]  Ralf Zimmer,et al.  Toward a gold standard for benchmarking gene set enrichment analysis , 2019, bioRxiv.

[47]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[48]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[49]  S. Ghosh,et al.  Regulation of NF-κB by TNF family cytokines. , 2014, Seminars in immunology.

[50]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[51]  N. Wray,et al.  Underestimated Effect Sizes in GWAS: Fundamental Limitations of Single SNP Analysis for Dichotomous Phenotypes , 2011, PloS one.

[52]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[53]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[54]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[55]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[56]  Benjamin J. Raphael,et al.  Network propagation: a universal amplifier of genetic associations , 2017, Nature Reviews Genetics.

[57]  Gary D Bader,et al.  Pathway and network analysis of cancer genomes , 2015, Nature Methods.

[58]  Tim Beißbarth,et al.  Ror2 Signaling and Its Relevance in Breast Cancer Progression , 2017, Front. Oncol..

[59]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[60]  Joseph Avruch,et al.  Mammalian MAPK signal transduction pathways activated by stress and inflammation: a 10-year update. , 2012, Physiological reviews.

[61]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[62]  Guanming Wu,et al.  ReactomeFIViz : a Cytoscape app for pathway and network-based data analysis , 2022 .

[63]  Alexander E. Ivliev,et al.  Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach , 2013, PloS one.

[64]  David J. Eisenman,et al.  RFX transcription factors are essential for hearing in mice , 2015, Nature Communications.

[65]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[66]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[67]  J. Danesh,et al.  Association analyses based on false discovery rate implicate new loci for coronary artery disease , 2017, Nature Genetics.