Augmenting subnetwork inference with information extracted from the scientific literature

Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference

[1]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[2]  Mark Gerstein,et al.  A comprehensive nuclear receptor network for breast cancer cells. , 2013, Cell reports.

[3]  N. Perrimon,et al.  Genome-wide RNAi screen reveals a specific sensitivity of IRES-containing RNA viruses to host translation inhibition. , 2005, Genes & development.

[4]  Sampo Pyysalo,et al.  Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013 , 2015, BMC Bioinformatics.

[5]  Dietrich Rebholz-Schuhmann,et al.  Biological network extraction from scientific literature: state of the art and challenges , 2014, Briefings Bioinform..

[6]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[7]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[8]  Zonghao Gu,et al.  Generating Multiple Solutions for Mixed Integer Programming Problems , 2007, IPCO.

[9]  Stephen Muggleton,et al.  Application of abductive ILP to learning metabolic network inhibition from temporal data , 2006, Machine Learning.

[10]  Roded Sharan,et al.  Network-Free Inference of Knockout Effects in Yeast , 2010, PLoS Comput. Biol..

[11]  T. M. Murali,et al.  Network-Based Prediction and Analysis of HIV Dependency Factors , 2011, Annual International Conference on Research in Computational Molecular Biology.

[12]  Russ B. Altman,et al.  A global network of biomedical relationships derived from text , 2018, Bioinform..

[13]  Lore Cloots,et al.  PheNetic: network-based interpretation of unstructured gene lists in E. coli. , 2013, Molecular bioSystems.

[14]  Zoubin Ghahramani,et al.  Gene function prediction from synthetic lethality networks via ranking on demand , 2010, Bioinform..

[15]  Anna I. Rissman,et al.  The Gene Desert Mammary Carcinoma Susceptibility Locus Mcs1a Regulates Nr2f1 Modifying Mammary Epithelial Cell Differentiation and Proliferation , 2013, PLoS genetics.

[16]  T. M. Murali,et al.  Network-Based Prediction and Analysis of HIV Dependency Factors , 2011, RECOMB.

[17]  Tao Jiang,et al.  Uncover disease genes by maximizing information flow in the phenome–interactome network , 2011, Bioinform..

[18]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[19]  T. Pierson,et al.  A CRISPR screen defines a signal peptide processing pathway required by flaviviruses , 2016, Nature.

[20]  J. Qian,et al.  Construction of human activity-based phosphorylation networks , 2013, Molecular systems biology.

[21]  Mark Craven,et al.  Inferring Host Gene Subnetworks Involved in Viral Replication , 2014, PLoS Comput. Biol..

[22]  Roded Sharan,et al.  SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments , 2007, ISMB/ECCB.

[23]  Rainer Spang,et al.  Non-transcriptional pathway features reconstructed from secondary effects of RNA interference , 2005, Bioinform..

[24]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[25]  Kuan-Teh Jeang,et al.  A Genome-wide Short Hairpin RNA Screening of Jurkat T-cells for Human Proteins Contributing to Productive HIV-1 Replication* , 2009, The Journal of Biological Chemistry.

[26]  Bart De Moor,et al.  An unbiased evaluation of gene prioritization tools , 2012, Bioinform..

[27]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[28]  V. Thorsson,et al.  Discovery of regulatory interactions through perturbation: inference and experimental design. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[29]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[30]  Roded Sharan,et al.  An Algorithm for Orienting Graphs Based on Cause-Effect Pairs and Its Applications to Orienting Protein Networks , 2008, WABI.

[31]  Marinka Zitnik,et al.  Matrix Factorization-Based Data Fusion for Gene Function Prediction in Baker's Yeast and Slime Mold , 2013, Pacific Symposium on Biocomputing.

[32]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[33]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[34]  Hoifung Poon,et al.  Distant Supervision for Cancer Pathway Extraction from Text , 2014, Pacific Symposium on Biocomputing.

[35]  A. Chand,et al.  Insights into Orphan Nuclear Receptors as Prognostic Markers and Novel Therapeutic Targets for Breast Cancer , 2015, Front. Endocrinol..

[36]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[37]  Áine McKnight,et al.  A whole genome screen for HIV restriction factors , 2011, Retrovirology.

[38]  J. Lieberman,et al.  Identification of Host Proteins Required for HIV Infection Through a Functional Genomic Screen , 2007, Science.

[39]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[40]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[41]  Stephen Muggleton,et al.  Developing a Logical Model of Yeast Metabolism , 2001, Electron. Trans. Artif. Intell..

[42]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[43]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[44]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[45]  Gary D Bader,et al.  Functional Genomic Landscape of Human Breast Cancer Drivers, Vulnerabilities, and Resistance , 2016, Cell.

[46]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[47]  Hoifung Poon,et al.  Literome: PubMed-scale genomic knowledge base in the cloud , 2014, Bioinform..

[48]  Joshua M. Stuart,et al.  A Factor Graph Nested Effects Model To Identify Networks from Genetic Perturbations , 2009, PLoS Comput. Biol..

[49]  Deborah Chasman,et al.  Improving the interpretability of integer linear programming methods for biological subnetwork inference , 2014 .

[50]  Mark Craven,et al.  Limited Agreement of Independent RNAi Screens for Virus-Required Host Genes Owes More to False-Negative than False-Positive Factors , 2013, PLoS Comput. Biol..

[51]  Amy S. Espeseth,et al.  Genome-scale RNAi screen for host factors required for HIV replication. , 2008, Cell host & microbe.

[52]  P. Ahlquist,et al.  Systematic, genome-wide identification of host genes affecting replication of a positive-strand RNA virus , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Donna R. Maglott,et al.  Human immunodeficiency virus type 1, human protein interaction database at NCBI , 2008, Nucleic Acids Res..

[54]  Tsuyoshi Kato,et al.  Selective integration of multiple biological data for supervised network inference , 2005, Bioinform..

[55]  Tommi S. Jaakkola,et al.  Physical Network Models , 2004, J. Comput. Biol..

[56]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[57]  R. König,et al.  Global Analysis of Host-Pathogen Interactions that Regulate Early-Stage HIV-1 Replication , 2008, Cell.

[58]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[59]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[60]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..