EGFI: Drug-Drug Interaction Extraction and Generation with Fusion of Enriched Entity and Sentence Information

Motivation: The rapid growth in literature accumulates diverse and yet comprehensive biomedical knowledge hidden to be mined such as drug interactions. However, it is difficult to extract the heterogeneous knowledge to retrieve or even discover the latest and novel knowledge in an efficient manner. To address such a problem, we propose EGFI for extracting and consolidating drug interactions from large-scale medical literature text data. Specifically, EGFI consists of two parts: classification and generation. In the classification part, EGFI encompasses the language model BioBERT which has been comprehensively pre-trained on biomedical corpus. In particular, we propose the multihead self-attention mechanism and pack BiGRU to fuse multiple semantic information for rigorous context modeling. In the generation part, EGFI utilizes another pre-trained language model BioGPT-2 where the generation sentences are selected based on filtering rules. Results: We evaluated the classification part on “DDIs 2013” dataset and “DDTs” dataset, achieving the F1 scores of 0.842 and 0.720 respectively. Moreover, we applied the classification part to distinguish high-quality generated sentences and verified with the existing growth truth to confirm the filtered sentences. The generated sentences that are not recorded in DrugBank and DDIs 2013 dataset also demonstrated the potential of EGFI to identify novel drug relationships. Contact: lhuang93-c@my.cityu.edu.hk. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Mazen Alobaidi,et al.  Linked open data-based framework for automatic biomedical ontology generation , 2018, BMC Bioinformatics.

[2]  Wei Zhang,et al.  SEE: Syntax-aware Entity Embedding for Neural Relation Extraction , 2018, AAAI.

[3]  Jaewoo Kang,et al.  Drug drug interaction extraction from the literature using a recursive neural network , 2018, PloS one.

[4]  N. Sultana,et al.  Erythromycin-antacid interaction. , 1993, Die Pharmazie.

[5]  Jun Feng,et al.  Drug-Drug Interaction Extraction via Recurrent Hybrid Convolutional Neural Networks with an Improved Focal Loss , 2019, Entropy.

[6]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[7]  Deyu Zhou,et al.  Position-aware deep multi-task learning for drug-drug interaction extraction , 2018, Artif. Intell. Medicine.

[8]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[9]  M. Linnoila Benzodiazepines and alcohol. , 1990, Journal of psychiatric research.

[10]  Hui Yang,et al.  A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories , 2020, Nat. Mach. Intell..

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Ying Liu,et al.  Deep learning for drug-drug interaction extraction from the literature: a review , 2020, Briefings Bioinform..

[13]  Hongfei Lin,et al.  Drug drug interaction extraction from biomedical literature using syntax convolutional neural network , 2016, Bioinform..

[14]  Xiaolong Wang,et al.  Drug-Drug Interaction Extraction via Convolutional Neural Networks , 2016, Comput. Math. Methods Medicine.

[15]  Chao Yang,et al.  A Survey on Deep Transfer Learning , 2018, ICANN.

[16]  G. Bocci,et al.  Drug therapeutic failures in emergency department patients. A university hospital experience. , 2004, Pharmacological research.

[17]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[18]  Yifan He,et al.  Enriching Pre-trained Language Model with Entity Information for Relation Classification , 2019, CIKM.

[19]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[20]  Danqi Chen,et al.  A Frustratingly Easy Approach for Joint Entity and Relation Extraction , 2020, ArXiv.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Andrei Alexandrescu,et al.  Modern C++ design: generic programming and design patterns applied , 2001 .

[23]  Doheon Lee,et al.  Literature mining for context-specific molecular relations using multimodal representations (COMMODAR) , 2020, BMC Bioinform..

[24]  Yujiu Yang,et al.  Word Embedding Representation with Synthetic Position and Context Information for Relation Extraction , 2018, 2018 IEEE International Conference on Big Knowledge (ICBK).

[25]  Yijia Zhang,et al.  An attention-based effective neural model for drug-drug interactions extraction , 2017, BMC Bioinformatics.

[26]  Xiaoxia Liu,et al.  SemaTyP: a knowledge graph based literature mining method for drug discovery , 2018, BMC Bioinformatics.

[27]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[28]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[29]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[30]  Jiawei Han,et al.  Annotating gene sets by mining large literature collections with protein networks , 2017, PSB.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[33]  Yannis Papanikolaou,et al.  DARE: Data Augmented Relation Extraction with GPT-2 , 2020, ArXiv.

[34]  Ayu Purwarianti,et al.  Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger , 2018, 2018 International Conference on Asian Language Processing (IALP).

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Tutut Herawan,et al.  Computational and mathematical methods in medicine. , 2006, Computational and mathematical methods in medicine.

[37]  Zhiyong Lu,et al.  Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[38]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[39]  Dichao Hu,et al.  An Introductory Survey on Attention Mechanisms in NLP Problems , 2018, IntelliSys.

[40]  G. De Sarro,et al.  Effects of Carbamazepine/Oxycodone Coadministration in the Treatment of Trigeminal Neuralgia , 2011, The Annals of pharmacotherapy.

[41]  Yu Zhu,et al.  Extracting drug-drug interactions from texts with BioBERT and multiple entity-aware attentions , 2020, J. Biomed. Informatics.

[42]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.