Bio-semantic relation extraction with attention-based external knowledge reinforcement

Background Semantic resources such as knowledge bases contains high-quality-structured knowledge and therefore require significant effort from domain experts. Using the resources to reinforce the information retrieval from the unstructured text may further exploit the potentials of such unstructured text resources and their curated knowledge. Results The paper proposes a novel method that uses a deep neural network model adopting the prior knowledge to improve performance in the automated extraction of biological semantic relations from the scientific literature. The model is based on a recurrent neural network combining the attention mechanism with the semantic resources, i.e., UniProt and BioModels. Our method is evaluated on the BioNLP and BioCreative corpus, a set of manually annotated biological text. The experiments demonstrate that the method outperforms the current state-of-the-art models, and the structured semantic information could improve the result of bio-text-mining. Conclusion The experiment results show that our approach can effectively make use of the external prior knowledge information and improve the performance in the protein-protein interaction extraction task. The method should be able to be generalized for other types of data, although it is validated on biomedical texts.

[1]  Zhiyong Lu,et al.  Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases , 2011 .

[2]  Karin M. Verspoor,et al.  Document Triage and Relation Extraction for Protein-Protein Interactions affected by Mutations , 2017 .

[3]  R S Quatrano,et al.  14-3-3 Proteins Are Part of an Abscisic Acid–VIVIPAROUS1 (VP1) Response Complex in the Em Promoter and Interact with VP1 and EmBP1 , 1998, Plant Cell.

[4]  Lei Hua,et al.  A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction , 2016, BioMed research international.

[5]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Andrew McCallum,et al.  Attending to All Mention Pairs for Full Abstract Biological Relation Extraction , 2017, AKBC@NIPS.

[8]  Amit P. Sheth,et al.  Challenges in understanding clinical notes: why NLP engines fall short and where background knowledge can help , 2013, DARE '13.

[9]  Zhiyong Lu,et al.  Overview of the BioCreative III Workshop , 2011, BMC Bioinformatics.

[10]  Yijia Zhang,et al.  A hybrid model based on neural networks for biomedical relation extraction , 2018, J. Biomed. Informatics.

[11]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[12]  Tom M. Mitchell,et al.  Leveraging Knowledge Bases in LSTMs for Improving Machine Reading , 2017, ACL.

[13]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[14]  G. Mengozzi,et al.  Assessment of Diagnostic and Prognostic Role of Copeptin in the Clinical Setting of Sepsis , 2016, BioMed research international.

[15]  Andrew McCallum,et al.  Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction , 2018, NAACL.

[16]  Yifan Peng,et al.  Extracting chemical–protein relations with ensembles of SVM and deep learning models , 2018, Database J. Biol. Databases Curation.

[17]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[18]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[19]  Makoto Miwa,et al.  Enhancing Drug-Drug Interaction Extraction from Texts by Molecular Structure Information , 2018, ACL.

[20]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[21]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Kun Ma,et al.  Leveraging prior knowledge for protein–protein interaction extraction with memory network , 2018, Database J. Biol. Databases Curation.

[24]  Xinyu He,et al.  Extracting Biomedical Event Using Feature Selection and Word Representation , 2016, BioNLP.

[25]  Shixian Ning,et al.  Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Fei Li,et al.  A neural joint model for entity and relation extraction from biomedical text , 2017, BMC Bioinformatics.

[27]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[28]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[29]  Robert J. Ferl,et al.  A single Arabidopsis GF14 isoform possesses biochemical characteristics of diverse 14-3-3 homologues , 1994, Plant Molecular Biology.

[30]  Nicolas Le Novère,et al.  BioModels Database: a repository of mathematical models of biological processes. , 2013, Methods in molecular biology.

[31]  Xiangrong Zhang,et al.  LitWay, Discriminative Extraction for Different Bio-Events , 2016, BioNLP.

[32]  Erol Gelenbe,et al.  Aligning protein-protein interaction networks using random neural networks , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[33]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[34]  Tung Tran,et al.  Exploring a Deep Learning Pipeline for the BioCreative VI Precision Medicine Task , 2017 .

[35]  Vasant Honavar,et al.  Identification of Surface Residues Involved in Protein-Protein Interaction — A Support Vector Machine Approach , 2003 .

[36]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.