Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study

Background The most current methods applied for intrasentence relation extraction in the biomedical literature are inadequate for document-level relation extraction, in which the relationship may cross sentence boundaries. Hence, some approaches have been proposed to extract relations by splitting the document-level datasets through heuristic rules and learning methods. However, these approaches may introduce additional noise and do not really solve the problem of intersentence relation extraction. It is challenging to avoid noise and extract cross-sentence relations. Objective This study aimed to avoid errors by dividing the document-level dataset, verify that a self-attention structure can extract biomedical relations in a document with long-distance dependencies and complex semantics, and discuss the relative benefits of different entity pretreatment methods for biomedical relation extraction. Methods This paper proposes a new data preprocessing method and attempts to apply a pretrained self-attention structure for document biomedical relation extraction with an entity replacement method to capture very long-distance dependencies and complex semantics. Results Compared with state-of-the-art approaches, our method greatly improved the precision. The results show that our approach increases the F1 value, compared with state-of-the-art methods. Through experiments of biomedical entity pretreatments, we found that a model using an entity replacement method can improve performance. Conclusions When considering all target entity pairs as a whole in the document-level dataset, a pretrained self-attention structure is suitable to capture very long-distance dependencies and learn the textual context and complicated semantics. A replacement method for biomedical entities is conducive to biomedical relation extraction, especially to document-level relation extraction.

[1]  Shoubin Dong,et al.  Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study , 2019, JMIR medical informatics.

[2]  Sophia Ananiadou,et al.  Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network , 2019, ACL.

[3]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Yijia Zhang,et al.  Hierarchical Recurrent Convolutional Neural Network for Chemical-protein Relation Extraction from Biomedical Literature , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  Tung Tran,et al.  An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations , 2018, Database J. Biol. Databases Curation.

[7]  Yue Zhang,et al.  N-ary Relation Extraction using Graph-State LSTM , 2018, EMNLP.

[8]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[9]  Nazli Goharian,et al.  Relation Extraction for Protein-protein Interactions Affected by Mutations , 2018, BCB.

[10]  Yifan Peng,et al.  Extracting chemical–protein relations with ensembles of SVM and deep learning models , 2018, Database J. Biol. Databases Curation.

[11]  Kun Ma,et al.  Leveraging prior knowledge for protein–protein interaction extraction with memory network , 2018, Database J. Biol. Databases Curation.

[12]  Ming Yang,et al.  Chemical-induced disease extraction via recurrent piecewise convolutional neural networks , 2018, BMC Medical Informatics and Decision Making.

[13]  Jaewoo Kang,et al.  Chemical–gene relation extraction using recursive neural network , 2018, Database J. Biol. Databases Curation.

[14]  Andrew McCallum,et al.  Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction , 2018, NAACL.

[15]  Kotagiri Ramamohanarao,et al.  Exploiting graph kernels for high performance biomedical relation extraction , 2018, Journal of Biomedical Semantics.

[16]  Wei Zheng,et al.  Drug–drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths , 2017, Bioinform..

[17]  Yifan Peng,et al.  BioCreative VI Precision Medicine Track: creating a training corpus for mining protein-protein interactions affected by mutations , 2017, BioNLP.

[18]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[19]  Guodong Zhou,et al.  Chemical-induced disease relation extraction via convolutional neural network , 2017, Database J. Biol. Databases Curation.

[20]  Anália Lourenço,et al.  Overview of the BioCreative VI chemical-protein interaction Track , 2017 .

[21]  Xiao Sun,et al.  Multichannel Convolutional Neural Network for Biological Relation Extraction , 2016, BioMed research international.

[22]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[23]  Zhiyong Lu,et al.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction , 2016, Database J. Biol. Databases Curation.

[24]  Long Chen,et al.  Exploiting syntactic and semantics information for chemical–disease relation extraction , 2016, Database J. Biol. Databases Curation.

[25]  Guodong Zhou,et al.  Chemical-induced disease relation extraction with various linguistic features , 2016, Database J. Biol. Databases Curation.

[26]  Yifan Peng,et al.  Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task , 2016, Database J. Biol. Databases Curation.

[27]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[28]  D. I. Ward Two cases of amisulpride overdose: A cause for prolonged QT syndrome , 2005, Emergency medicine Australasia : EMA.

[29]  Stratas Ne A double-blind study of the efficacy and safety of dothiepin hydrochloride in the treatment of major depressive disorder. , 1984 .

[30]  N. E. Stratas A double-blind study of the efficacy and safety of dothiepin hydrochloride in the treatment of major depressive disorder. , 1984, The Journal of clinical psychiatry.