New advances in extracting and learning from protein-protein interactions within unstructured biomedical text data.

Protein–protein interactions, or PPIs, constitute a basic unit of our understanding of protein function. Though substantial effort has been made to organize PPI knowledge into structured databases, maintenance of these resources requires careful manual curation. Even then, many PPIs remain uncurated within unstructured text data. Extracting PPIs from experimental research supports assembly of PPI networks and highlights relationships crucial to elucidating protein functions. Isolating specific protein–protein relationships from numerous documents is technically demanding by both manual and automated means. Recent advances in the design of these methods have leveraged emerging computational developments and have demonstrated impressive results on test datasets. In this review, we discuss recent developments in PPI extraction from unstructured biomedical text. We explore the historical context of these developments, recent strategies for integrating and comparing PPI data, and their application to advancing the understanding of protein function. Finally, we describe the challenges facing the application of PPI mining to the text concerning protein families, using the multifunctional 14-3-3 protein family as an example.

[1]  Tung Tran,et al.  An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations , 2018, Database J. Biol. Databases Curation.

[2]  Richard Bonneau,et al.  deepNF: deep network fusion for protein function prediction , 2017, bioRxiv.

[3]  Qingyu Chen,et al.  Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine , 2019, Database J. Biol. Databases Curation.

[4]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[5]  Gos Micklem,et al.  Encompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactions , 2018, BMC Bioinformatics.

[6]  D. Lim,et al.  A MST1–FOXO1 cascade establishes endothelial tip cell polarity and facilitates sprouting angiogenesis , 2019, Nature Communications.

[7]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[8]  James J. Vincent,et al.  An in silico proteomics screen to predict and prioritize protein‐protein interactions dependent on post‐translationally modified motifs , 2018, Bioinform..

[9]  Youssef Iraqi,et al.  Predicting protein functions by applying predicate logic to biomedical literature , 2019, BMC Bioinformatics.

[10]  Vipin Vijayan,et al.  Multiple Network Alignment via MultiMAGNA++ , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  KL Pennington,et al.  The dynamic and stress-adaptive signaling hub of 14-3-3: emerging mechanisms of regulation and context-dependent protein–protein interactions , 2018, Oncogene.

[12]  Jie Ma,et al.  PPICurator: A Tool for Extracting Comprehensive Protein–Protein Interaction Information , 2019, Proteomics.

[13]  S. Døskeland,et al.  The 14-3-3 proteins in regulation of cellular metabolism. , 2011, Seminars in cell & developmental biology.

[14]  Zhiyong Lu,et al.  Exploring Semi-supervised Variational Autoencoders for Biomedical Relation Extraction , 2019, Methods.

[15]  Gabriel Valiente,et al.  AligNet: alignment of protein-protein interaction networks , 2019, BMC Bioinformatics.

[16]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[17]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[18]  Nazli Goharian,et al.  Relation Extraction for Protein-protein Interactions Affected by Mutations , 2018, BCB.

[19]  Marc R Wilkins,et al.  Visualizing Post‐Translational Modifications in Protein Interaction Networks Using PTMOracle , 2019, Current protocols in bioinformatics.

[20]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[21]  Bo Wang,et al.  Vicus: Exploiting local structures to improve network-based analysis of biological data , 2017, PLoS Comput. Biol..

[22]  Georgios A. Pavlopoulos,et al.  Protein-protein interaction predictions using text mining methods. , 2015, Methods.

[23]  Udo Hahn,et al.  Scholarly Information Extraction Is Going to Make a Quantum Leap with PubMed Central (PMC)® - But Moving from Abstracts to Full Texts Seems Harder than Expected , 2017, MedInfo.

[24]  H. Liu,et al.  The 14-3-3η chaperone protein promotes antiviral innate immunity via facilitating MDA5 oligomerization and intracellular redistribution , 2019, PLoS pathogens.

[25]  Bernard Espinasse,et al.  A logic-based relational learning approach to relation extraction: The OntoILPER system , 2019, Eng. Appl. Artif. Intell..

[26]  C. Der,et al.  14-3-3 ζ Negatively Regulates Raf-1 Activity by Interactions with the Raf-1 Cysteine-rich Domain* , 1997, The Journal of Biological Chemistry.

[27]  Shuigeng Zhou,et al.  Accurately Detecting Protein Complexes by Graph Embedding and Combining Functions with Interactions , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Kun Ma,et al.  Leveraging prior knowledge for protein–protein interaction extraction with memory network , 2018, Database J. Biol. Databases Curation.

[29]  Søren Brunak,et al.  A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts , 2018, PLoS Comput. Biol..

[30]  José Luís Oliveira,et al.  Handling Noise in Protein Interaction Networks , 2019, BioMed research international.

[31]  Jiawei Han,et al.  Annotating gene sets by mining large literature collections with protein networks , 2017, PSB.

[32]  M. He,et al.  PPI Finder: A Mining Tool for Human Protein-Protein Interactions , 2009, PloS one.

[33]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[34]  Denys Proux,et al.  A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions , 2000, ISMB.

[35]  A Valencia,et al.  An Overview of BioCreative II.5 , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  Jung-Hsien Chiang,et al.  Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining. , 2018, Journal of proteome research.

[37]  Karin M. Verspoor,et al.  BioC: a minimalist approach to interoperability for biomedical text processing , 2013, AMIA.

[38]  Robert Leaman,et al.  PubTator central: automated concept annotation for biomedical full text articles , 2019, Nucleic Acids Res..

[39]  Alfonso Valencia,et al.  The BioCreative V.5 evaluation workshop: tasks, organization, sessions and topics , 2017 .

[40]  Christopher Ré,et al.  Large-scale extraction of gene interactions from full-text literature using DeepDive , 2015, Bioinform..

[41]  Qingyu Chen,et al.  BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics , 2018, Database J. Biol. Databases Curation.

[42]  Jugal Kalita,et al.  Index-Based Network Aligner of Protein-Protein Interaction Networks , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Kotagiri Ramamohanarao,et al.  Exploiting graph kernels for high performance biomedical relation extraction , 2018, Journal of Biomedical Semantics.

[44]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[45]  Dong Wang,et al.  Exploiting locational and topological overlap model to identify modules in protein interaction networks , 2019, BMC Bioinformatics.

[46]  Nguyen Ha Vo,et al.  Efficient Extraction of Protein-Protein Interactions from Full-Text Articles , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  A. Tontowi,et al.  Expanded Diameter of Coronary Metal Stent: Simulation and Experiment , 2019, 2019 2nd International Conference on Bioinformatics, Biotechnology and Biomedical Engineering (BioMIC) - Bioinformatics and Biomedical Engineering.

[48]  C. Ottmann,et al.  14-3-3: A Case Study in PPI Modulation , 2018, Molecules.

[49]  Yizhou Sun,et al.  A reference set of curated biomedical data and metadata from clinical case reports , 2018, Scientific Data.

[50]  M. Pellegrini,et al.  Capturing variation impact on molecular interactions in the IMEx Consortium mutations data set , 2019, Nature Communications.

[51]  Soheil Jahangiri-Tazehkand,et al.  IMMAN: an R/Bioconductor package for Interolog protein network reconstruction, mapping and mining analysis , 2019, BMC Bioinformatics.

[52]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from the literature: Part II , 2005, Bioinform..

[53]  T. Wei,et al.  YWHA/14-3-3 proteins recognize phosphorylated TFEB by a noncanonical mode for controlling TFEB cytoplasmic localization , 2019, Autophagy.

[54]  Tingting Zhao,et al.  Automatic extraction of protein-protein interactions using grammatical relationship graph , 2018, BMC Medical Informatics and Decision Making.

[55]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[56]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[57]  Jeyakumar Natarajan,et al.  Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature , 2017, PloS one.

[58]  Bonnie Berger,et al.  Compact Integration of Multi-Network Topology for Functional Analysis of Genes. , 2016, Cell systems.

[59]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[60]  Yijia Zhang,et al.  A hybrid model based on neural networks for biomedical relation extraction , 2018, J. Biomed. Informatics.

[61]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[62]  Yvan Saeys,et al.  Extracting protein-protein interactions from text using rich feature vectors and feature selection , 2008, SMBM 2008.

[63]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[64]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[65]  Burkhard Rost,et al.  LocText: relation extraction of protein localizations to assist database curation , 2018, BMC Bioinformatics.

[66]  Fereidoon Sadri,et al.  Crowd enabled curation and querying of large and noisy text mined protein interaction data , 2018, Distributed and Parallel Databases.

[67]  Sung-Pil Choi,et al.  Extraction of protein–protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings , 2018, J. Inf. Sci..

[68]  Robert E. W. Hancock,et al.  MetaBridge: enabling network-based integrative analysis via direct protein interactors of metabolites , 2018, Bioinform..

[69]  Jaebum Kim,et al.  INTERSPIA: a web application for exploring the dynamics of protein-protein interactions among multiple species , 2018, Nucleic Acids Res..

[70]  Shawn Gu,et al.  From homogeneous to heterogeneous network alignment via colored graphlets , 2017, Scientific Reports.

[71]  Yung-Chun Chang,et al.  PIPE: a protein–protein interaction passage extraction module for BioCreative challenge , 2016, Database J. Biol. Databases Curation.

[72]  Min Song,et al.  Application of Public Knowledge Discovery Tool (PKDE4J) to Represent Biomedical Scientific Knowledge , 2018, Front. Res. Metr. Anal..

[73]  Tom R. Gaunt,et al.  MELODI: Mining Enriched Literature Objects to Derive Intermediates , 2017, bioRxiv.

[74]  Takashi Chikayama,et al.  Wide-coverage relation extraction from MEDLINE using deep syntax , 2015, BMC Bioinformatics.

[75]  Henning Hermjakob,et al.  CausalTAB: the PSI-MITAB 2.8 updated format for signalling data representation and dissemination , 2019, Bioinform..

[76]  Nanyun Peng,et al.  Building deep learning models for evidence classification from the open access biomedical literature , 2019, Database J. Biol. Databases Curation.

[77]  Jun'ichi Tsujii,et al.  Event Extraction from Biomedical Papers Using a Full Parser , 2000, Pacific Symposium on Biocomputing.

[78]  Fei Wang,et al.  Network embedding in biomedical data science , 2018, Briefings Bioinform..

[79]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[80]  Wei Wang,et al.  A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts , 2018, Journal of visualized experiments : JoVE.