ModEx: A text mining system for extracting mode of regulation of Transcription Factor-gene regulatory interaction

Transcription factors (TFs) are proteins that are fundamental to transcription and regulation of gene expression. Each TF may regulate multiple genes and each gene may be regulated by multiple TFs. TFs can act as either activator or repressor of gene expression. This complex network of interactions between TFs and genes underlies many developmental and biological processes and is implicated in several human diseases such as cancer. Hence deciphering the network of TF-gene interactions with information on mode of regulation (activation vs. repression) is an important step toward understanding the regulatory pathways that underlie complex traits. There are many experimental, computational, and manually curated databases of TF-gene interactions. In particular, high-throughput ChIP-seq datasets provide a large-scale map or transcriptional regulatory interactions. However, these interactions are not annotated with information on context and mode of regulation. Such information is crucial to gain a global picture of gene regulatory mechanisms and can aid in developing machine learning models for applications such as biomarker discovery, prediction of response to therapy, and precision medicine. In this work, we introduce a text-mining system to annotate ChIP-seq derived interaction with such meta data through mining PubMed articles. We evaluate the performance of our system using the gold standard small scale manually curated TRUSST database. Our results show that the method is able to accurately extract mode of regulation with F-score 0.77 on TRRUST curated interaction and F-score 0.96 on intersection of TRUSST and ChIP-network. We provide a HTTP REST API for our code to facilitate usage. Availability Source code and datasets are available for download on GitHub: https://github.com/samanfrm/modex HTTP REST API https://watson.math.umb.edu/modex/

[1]  Amir Vajdi,et al.  Patch-DCA: improved protein interface prediction by utilizing structural information and clustering DCA scores , 2020, Bioinform..

[2]  Guy Karlebach,et al.  Modelling and analysis of gene regulatory networks , 2008, Nature Reviews Molecular Cell Biology.

[3]  Amir Vajdi,et al.  Patch-DCA: Improved Protein Interface Prediction by utilizing Structural Information and Clustering DCA scores , 2019, bioRxiv.

[4]  Federica Toffalini,et al.  Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data , 2010, Nucleic acids research.

[5]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[6]  Zhiyong Lu,et al.  Understanding PubMed® user search behavior through log analysis , 2009, Database J. Biol. Databases Curation.

[7]  Jung Eun Shim,et al.  TRRUST: a reference database of human transcriptional regulatory interactions , 2015, Scientific Reports.

[8]  Hyojin Kim,et al.  TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions , 2017, Nucleic Acids Res..

[9]  Amir Vajdi,et al.  A new DP algorithm for comparing gene expression data using geometric similarity , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[10]  Sampo Pyysalo,et al.  A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text , 2013, Bioinform..

[11]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[12]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[13]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[14]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[15]  L. Young,et al.  Upregulation of Id1 by Epstein-Barr Virus-encoded LMP1 confers resistance to TGFβ-mediated growth inhibition , 2010, Molecular Cancer.

[16]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[17]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[18]  Benjamin M. Gyori,et al.  From word models to executable models of signaling networks using automated assembly , 2017, bioRxiv.

[19]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[20]  José Luís Oliveira,et al.  BeCAS: biomedical concept recognition services and visualization , 2013, Bioinform..

[21]  Shih-Yin Tsai,et al.  Emerging roles of E2Fs in cancer: an exit from cell cycle control , 2009, Nature Reviews Cancer.

[22]  Alexander E. Kel,et al.  Transcription Regulatory Regions Database (TRRD): its status in 2000 , 2000, Nucleic Acids Res..

[23]  Manuel C. Peitsch,et al.  Construction of a Computable Network Model for DNA Damage, Autophagy, Cell Death, and Senescence , 2013, Bioinformatics and biology insights.

[24]  Guodong Zhou,et al.  Dependency-directed Tree Kernel-based Protein-Protein Interaction Extraction from Biomedical Literature , 2011, IJCNLP.

[25]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[26]  ZhouGuodong,et al.  Tree kernel-based protein-protein interaction extraction from biomedical literature , 2012 .

[27]  Juliane Fluck,et al.  BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language , 2016, Database J. Biol. Databases Curation.

[28]  Deepak Kaul,et al.  Potential tumor suppressive function of miR-196b in B-cell lineage acute lymphoblastic leukemia , 2010, Molecular and Cellular Biochemistry.

[29]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[30]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[31]  Ping Chen,et al.  Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks , 2016, BMC Bioinformatics.

[32]  Hongfang Liu,et al.  BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences , 2017, Database J. Biol. Databases Curation.

[33]  Jun Sese,et al.  ChIP‐Atlas: a data‐mining suite powered by full integration of public ChIP‐seq data , 2018, EMBO reports.

[34]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[35]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[36]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[37]  Adrian J. Shepherd,et al.  A text-mining system for extracting metabolic reactions from full-text articles , 2012, BMC Bioinformatics.

[38]  Bruno Amati,et al.  Oncogenic activity of the c-Myc protein requires dimerization with Max , 1993, Cell.

[39]  Hongfang Liu,et al.  Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature , 2015, BMC Bioinformatics.

[41]  S Farahmand,et al.  CytoGTA: A cytoscape plugin for identifying discriminative subnetwork markers using a game theoretic approach , 2017, PloS one.

[42]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[43]  Wen-Lian Hsu,et al.  MeInfoText 2.0: gene methylation and cancer relation extraction from biomedical literature , 2011, BMC Bioinformatics.

[44]  Obi L. Griffith,et al.  ORegAnno: an open-access community-driven resource for regulatory annotation , 2007, Nucleic Acids Res..

[45]  S Farahmand,et al.  GTA: a game theoretic approach to identifying cancer subnetwork markers. , 2016, Molecular bioSystems.

[46]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[47]  Alexander E. Kel,et al.  GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments , 2016, Nucleic Acids Res..

[48]  S. Rafii,et al.  Splitting vessels: Keeping lymph apart from blood , 2003, Nature Medicine.

[49]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[50]  Erik van Nimwegen,et al.  SwissRegulon: a database of genome-wide annotations of regulatory sites , 2006, Nucleic Acids Res..

[51]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[52]  Goran Nenadic,et al.  BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events , 2012, Bioinform..

[53]  Michael Q. Zhang,et al.  TRED: a transcriptional regulatory element database, new entries and other development , 2007, Nucleic Acids Res..

[54]  Carlos F. Lopez,et al.  Programming biological models in Python using PySB , 2013, Molecular systems biology.

[55]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[56]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[57]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[58]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[59]  Zahra Razaghi-Moghadam,et al.  Systems genetics of nonsyndromic orofacial clefting provides insights into its complex aetiology , 2018, European Journal of Human Genetics.

[60]  Mihai Surdeanu,et al.  A Domain-independent Rule-based Framework for Event Extraction , 2015, ACL.

[61]  Jung-Hsien Chiang,et al.  Overview of the gene ontology task at BioCreative IV , 2014, Database J. Biol. Databases Curation.

[62]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[63]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[64]  Christopher Ré,et al.  Large-scale extraction of gene interactions from full-text literature using DeepDive , 2015, Bioinform..

[65]  Saman Farahmand,et al.  Causal Inference Engine: a platform for directional gene set enrichment analysis and inference of active transcriptional regulators , 2019, bioRxiv.

[66]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[67]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[68]  Sama Goliaei,et al.  Identifying Cancer Subnetwork Markers Using Game Theory Method , 2015, International Conference on Biomedical and Health Informatics.

[69]  Juliane Fluck,et al.  Detecting miRNA Mentions and Relations in Biomedical Literature , 2014, F1000Research.