A two-stage deep learning approach for extracting entities and relationships from medical texts

This work presents a two-stage deep learning system for Named Entity Recognition (NER) and Relation Extraction (RE) from medical texts. These tasks are a crucial step to many natural language understanding applications in the biomedical domain. Automatic medical coding of electronic medical records, automated summarizing of patient records, automatic cohort identification for clinical studies, text simplification of health documents for patients, early detection of adverse drug reactions or automatic identification of risk factors are only a few examples of the many possible opportunities that the text analysis can offer in the clinical domain. In this work, our efforts are primarily directed towards the improvement of the pharmacovigilance process by the automatic detection of drug-drug interactions (DDI) from texts. Moreover, we deal with the semantic analysis of texts containing health information for patients. Our two-stage approach is based on Deep Learning architectures. Concretely, NER is performed combining a bidirectional Long Short-Term Memory (Bi-LSTM) and a Conditional Random Field (CRF), while RE applies a Convolutional Neural Network (CNN). Since our approach uses very few language resources, only the pre-trained word embeddings, and does not exploit any domain resources (such as dictionaries or ontologies), this can be easily expandable to support other languages and clinical applications that require the exploitation of semantic information (concepts and relationships) from texts. During the last years, the task of DDI extraction has received great attention by the BioNLP community. However, the problem has been traditionally evaluated as two separate subtasks: drug name recognition and extraction of DDIs. To the best of our knowledge, this is the first work that provides an evaluation of the whole pipeline. Moreover, our system obtains state-of-the-art results on the eHealth-KD challenge, which was part of the Workshop on Semantic Analysis at SEPLN (TASS-2018).

[1]  Isabel Segura-Bedmar,et al.  Predicting of anaphylaxis in big data EMR by exploring machine learning approaches , 2018, J. Biomed. Informatics.

[2]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[3]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[4]  Xiaolong Wang,et al.  Drug-Drug Interaction Extraction via Convolutional Neural Networks , 2016, Comput. Math. Methods Medicine.

[5]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[6]  Chandra Bhagavatula,et al.  The AI2 system at SemEval-2017 Task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction , 2017, *SEMEVAL.

[7]  Mourad Gridach,et al.  Character-level neural network for biomedical named entity recognition , 2017, J. Biomed. Informatics.

[8]  Paloma Martínez,et al.  Pharmacovigilance through the development of text mining and natural language processing techniques , 2015, J. Biomed. Informatics.

[9]  Rafael Muñoz,et al.  A corpus to support eHealth Knowledge Discovery technologies , 2019, J. Biomed. Informatics.

[10]  Paloma Martínez,et al.  Lessons learnt from the DDIExtraction-2013 Shared Task , 2014, J. Biomed. Informatics.

[11]  Donghong Ji,et al.  Long short-term memory RNN for biomedical named entity recognition , 2017, BMC Bioinformatics.

[12]  Hercules Dalianis,et al.  Applications of Clinical Text Mining , 2018 .

[13]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[14]  Jari Björne,et al.  UTurku: Drug Named Entity Recognition and Drug-Drug Interaction Extraction Using SVM Classification and Domain Knowledge , 2013, SemEval@NAACL-HLT.

[15]  Isabel Segura-Bedmar,et al.  The 1st DDIExtraction-2011 challenge task: Extraction of Drug-Drug Interactions from biomedical texts , 2011 .

[16]  Xiao Sun,et al.  Multichannel Convolutional Neural Network for Biological Relation Extraction , 2016, BioMed research international.

[17]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[18]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[19]  Xiaolong Wang,et al.  Drug Name Recognition: Approaches and Resources , 2015, Inf..

[20]  Burr Settles ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[21]  Jari Björne,et al.  TEES 2.2: Biomedical Event Extraction for Diverse Corpora , 2015, BMC Bioinformatics.

[22]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..

[23]  Paloma Martínez,et al.  The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions , 2013, J. Biomed. Informatics.

[24]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..