A network embedding approach to identify active modules in biological interaction networks

This study proposes the AMINE method as a flexible and efficient approach to identify active modules from a data embedding combining gene expression and interaction data. The identification of condition-specific gene sets from transcriptomic experiments is important to reveal regulatory and signaling mechanisms associated with a given cellular response. Statistical methods of differential expression analysis, designed to assess individual gene variations, have trouble highlighting modules of small varying genes whose interaction is essential to characterize phenotypic changes. To identify these highly informative gene modules, several methods have been proposed in recent years, but they have many limitations that make them of little use to biologists. Here, we propose an efficient method for identifying these active modules that operates on a data embedding combining gene expressions and interaction data. Applications carried out on real datasets show that our method can identify new groups of genes of high interest corresponding to functions not revealed by traditional approaches. Software is available at https://github.com/claudepasquier/amine.

[1]  C. Pasquier,et al.  Evolutionary Divergence of Phosphorylation to Regulate Interactive Protein Networks in Lower and Higher Species , 2022, International journal of molecular sciences.

[2]  M. Kanehisa,et al.  KEGG for taxonomy-based analysis of pathways and genomes , 2022, Nucleic Acids Res..

[3]  M. Hirn,et al.  Accurately modeling biased random walks on weighted networks using node2vec+ , 2022, bioRxiv.

[4]  E. Sverdlov,et al.  Data Incompleteness May form a Hard-to-Overcome Barrier to Decoding Life’s Mechanism , 2022, Biology.

[5]  C. Pasquier,et al.  Persistent Properties of a Subpopulation of Cancer Cells Overexpressing the Hedgehog Receptor Patched , 2022, Pharmaceutics.

[6]  C. Pasquier,et al.  Temporal and sequential order of nonoverlapping gene networks unraveled in mated female Drosophila , 2021, Life Science Alliance.

[7]  Gary D Bader,et al.  The reactome pathway knowledgebase 2022 , 2021, Nucleic Acids Res..

[8]  Sylvain D. Vallet,et al.  The IntAct database: efficient access to fine-grained molecular interaction data , 2021, Nucleic Acids Res..

[9]  Mehrdad Rostami,et al.  A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding , 2021, Comput. Biol. Medicine.

[10]  Liang Yu,et al.  A heterogeneous network embedding framework for predicting similarity-based drug-target interactions , 2021, Briefings Bioinform..

[11]  L. Qin,et al.  The molecular biology of pancreatic adenocarcinoma: translational challenges and clinical perspectives , 2021, Signal Transduction and Targeted Therapy.

[12]  Fang-Xiang Wu,et al.  Essential Protein Prediction Based on node2vec and XGBoost , 2021, J. Comput. Biol..

[13]  David B. Blumenthal,et al.  On the limits of active module identification , 2021, Briefings Bioinform..

[14]  R. Shamir,et al.  DOMINO: a network‐based active module identification algorithm with reduced rate of false calls , 2021, Molecular systems biology.

[15]  Alexander R. Pico,et al.  WikiPathways: connecting communities , 2020, Nucleic Acids Res..

[16]  Kara Dolinski,et al.  The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions , 2020, Protein science : a publication of the Protein Society.

[17]  J. Mancias,et al.  Respiratory Supercomplexes Promote Mitochondrial Efficiency and Growth in Severely Hypoxic Pancreatic Cancer , 2020, Cell reports.

[18]  Min Li,et al.  NEDD: a network embedding based method for predicting drug-disease associations , 2020, BMC Bioinformatics.

[19]  Rosalie C. Sears,et al.  Hypoxia: Friend or Foe for drug delivery in Pancreatic Cancer. , 2020, Cancer letters.

[20]  J. Dillner,et al.  Genome-wide transcriptome profiling of ex-vivo precision-cut slices from human pancreatic ductal adenocarcinoma , 2020, Scientific Reports.

[21]  A. Maitra,et al.  Pancreatic cancer stroma: an update on therapeutic targeting strategies , 2020, Nature Reviews Gastroenterology & Hepatology.

[22]  T. Gress,et al.  Microenvironmental Determinants of Pancreatic Cancer. , 2020, Physiological reviews.

[23]  K. Murphy,et al.  Dendritic Cell Paucity Leads to Dysfunctional Immune Surveillance in Pancreatic Cancer. , 2020, Cancer cell.

[24]  Denis Pallez,et al.  Population-based meta-heuristic for active modules identification , 2019 .

[25]  K. Lim,et al.  Development of resistance to FAK inhibition in pancreatic cancer is linked to stromal depletion , 2019, Gut.

[26]  Jiajie Peng,et al.  Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder , 2019, Front. Genet..

[27]  Tin Chi Nguyen,et al.  A Comprehensive Survey of Tools and Software for Active Subnetwork Identification , 2019, Front. Genet..

[28]  Yi Pan,et al.  A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Fei Wang,et al.  Network embedding in biomedical data science , 2018, Briefings Bioinform..

[30]  Xiaoli Li,et al.  Integrating node embeddings and biological annotations for genes to predict disease-gene associations , 2018, BMC Systems Biology.

[31]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[32]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[33]  Benjamin J. Raphael,et al.  Hierarchical HotNet: identifying hierarchies of altered subnetworks , 2018, Bioinform..

[34]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[35]  Viviana I. Risca,et al.  BLIMP1 Induces Transient Metastatic Heterogeneity in Pancreatic Cancer. , 2017, Cancer discovery.

[36]  Jaakko Nevalainen,et al.  Incorporating interaction networks into the determination of functionally related hit genes in genomic experiments with Markov random fields , 2017, Bioinform..

[37]  Dongdong Lin,et al.  Comparison of statistical methods for subnetwork detection in the integration of gene expression and protein interaction network , 2017, BMC Bioinformatics.

[38]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[39]  D. Weaver,et al.  Targeting Focal Adhesion Kinase Renders Pancreatic Cancers Responsive to Checkpoint Immunotherapy , 2016, Nature Medicine.

[40]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[41]  L. Wood,et al.  RUNX3 Controls a Metastatic Switch in Pancreatic Ductal Adenocarcinoma , 2015, Cell.

[42]  Albert-László Barabási,et al.  A DIseAse MOdule Detection (DIAMOnD) Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the Human Interactome , 2015, PLoS Comput. Biol..

[43]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[44]  Toshio Tanaka,et al.  IL-6 in inflammation, immunity, and disease. , 2014, Cold Spring Harbor perspectives in biology.

[45]  Alex J. Cornish,et al.  SANTA: Quantifying the Functional Content of Molecular Networks , 2014, PLoS Comput. Biol..

[46]  Jennifer Jie Xu,et al.  Knowledge Discovery and Data Mining , 2014, Computing Handbook, 3rd ed..

[47]  Seung Hoon Lee,et al.  Angiogenin Reduces Immune Inflammation via Inhibition of TANK-Binding Kinase 1 Expression in Human Corneal Fibroblast Cells , 2014, Mediators of inflammation.

[48]  Noah M. Daniels,et al.  Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks , 2013, PloS one.

[49]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[50]  A. Redig,et al.  Biochemical role of the collagen-rich tumour microenvironment in pancreatic cancer progression. , 2012, The Biochemical journal.

[51]  Hongyu Zhao,et al.  COSINE: COndition-SpecIfic sub-NEtwork identification using a global optimization method , 2011, Bioinform..

[52]  Reinhard Schneider,et al.  Using graph theory to analyze biological networks , 2011, BioData Mining.

[53]  Fengzhu Sun,et al.  A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila , 2009, BMC Genomics.

[54]  Tobias Müller,et al.  Identifying functional modules in protein–protein interaction networks: an integrated exact approach , 2008, ISMB.

[55]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[56]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[58]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[59]  A. Koong,et al.  Pancreatic tumors show high levels of hypoxia. , 2000, International journal of radiation oncology, biology, physics.

[60]  Albert,et al.  Topology of evolving networks: local events and universality , 2000, Physical review letters.

[61]  OUP accepted manuscript , 2021, Nucleic Acids Research.

[62]  Rainer Breitling,et al.  Graph-based iterative Group Analysis enhances microarray interpretation , 2004, BMC Bioinformatics.

[63]  A. Barabasi,et al.  Emergence of Scaling in Random Networks , 1999 .

[64]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[65]  Tobias Müller,et al.  Bioinformatics Applications Note Systems Biology Bionet: an R-package for the Functional Analysis of Biological Networks , 2022 .