Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

Because protein-protein interactions (PPIs) are crucial to understand living systems, harvesting these data is essential to probe disease development and discern gene/protein functions and biological processes. Some curated datasets contain PPI data derived from the literature and other sources (e.g., IntAct, BioGrid, DIP, and HPRD). However, they are far from exhaustive, and their maintenance is a labor-intensive process. On the other hand, machine learning methods to automate PPI knowledge extraction from the scientific literature have been limited by a shortage of appropriate annotated data. This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels and a Transformer-based deep learning method that exploits entities’ relational context information for relation representation to improve relation classification performance. The model’s performance is evaluated on four widely studied biomedical relation extraction datasets, as well as this work’s target PPI datasets, to observe the effectiveness of the representation to relation extraction tasks in various data. Results show the model outperforms prior state-of-the-art models. The code and data are available at: https://github.com/BNLNLP/PPI-Relation-Extraction

[1]  Zhao Bai,et al.  A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training , 2022, KSII Trans. Internet Inf. Syst..

[2]  Fei Huang,et al.  Improving Biomedical Pretrained Language Models with Knowledge , 2021, BIONLP.

[3]  Yung-Chun Chang,et al.  LBERT: Lexically-aware Transformers based Bidirectional Encoder Representation model for learning Universal Bio-Entity Relations. , 2020, Bioinformatics.

[4]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[5]  Lav R. Varshney,et al.  BERTology Meets Biology: Interpreting Attention in Protein Language Models , 2020, bioRxiv.

[6]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[7]  Jeffrey Ling,et al.  Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.

[8]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[9]  Qingyu Chen,et al.  Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine , 2019, Database J. Biol. Databases Curation.

[10]  Yifan Peng,et al.  Extracting chemical–protein relations with ensembles of SVM and deep learning models , 2018, Database J. Biol. Databases Curation.

[11]  Jeyakumar Natarajan,et al.  Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature , 2017, PloS one.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Yifan Peng,et al.  Deep learning for extracting protein-protein interactions from biomedical literature , 2017, BioNLP.

[14]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[15]  Paloma Martínez,et al.  The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions , 2013, J. Biomed. Informatics.

[16]  Laura Inés Furlong,et al.  The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships , 2012, J. Biomed. Informatics.

[17]  Wade H. Dunham,et al.  Affinity‐purification coupled to mass spectrometry: Basic principles and strategies , 2012, Proteomics.

[18]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[19]  Uwe Schlattner,et al.  Yeast Two-Hybrid, a Powerful Tool for Systems Biology , 2009, International journal of molecular sciences.

[20]  Helen L. Johnson,et al.  Concept recognition for extracting protein interaction relations from biomedical text , 2008, Genome Biology.

[21]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC bioinformatics.

[22]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[23]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[24]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[25]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Yanchun Liang,et al.  Deep Residual Convolutional Neural Network for Protein-Protein Interaction Extraction , 2019, IEEE Access.

[28]  Peter M. A. Sloot,et al.  A hybrid approach to extract protein-protein interactions , 2011, Bioinform..

[29]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .