An ensemble of neural models for nested adverse drug events and medication extraction with subwords

Abstract Objective This article describes an ensembling system to automatically extract adverse drug events and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2. Materials and Methods We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model. Results Our method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. Discussion Analysis of the development set showed that our neural models can detect more informative text regions than feature-based conditional random field models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities, especially nested entities. Conclusion The overall results have demonstrated that the ensemble method can accurately recognize entities, including nested and polysemous entities. Additionally, our method can recognize sparse entities by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.

[1]  Sophia Ananiadou,et al.  Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts , 2016, J. Biomed. Informatics.

[2]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[3]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[4]  Jun Xu,et al.  UTH_CCB System for Adverse Drug Reaction Extraction from Drug Labels at TAC-ADR 2017 , 2017, TAC.

[5]  Panayiotis Bozanis,et al.  Proceedings of the 10th Panhellenic conference on Advances in Informatics , 2005 .

[6]  Yanfang Ye,et al.  Adverse event detection by integrating twitter data and VAERS , 2018, Journal of Biomedical Semantics.

[7]  Kilian Q. Weinberger,et al.  Proceedings of the 25th International Conference on Neural Information Processing Systems , 2012 .

[8]  Sophia Ananiadou,et al.  A Neural Layered Model for Nested Named Entity Recognition , 2018, NAACL.

[9]  Goran Nenadic,et al.  Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017 , 2017, TAC.

[10]  Anthony N. Nguyen,et al.  Analysis of Word Embeddings and Sequence Features for Clinical Information Extraction , 2015, ALTA.

[11]  Yongqun He,et al.  Ontology-Based Vaccine and Drug Adverse Event Representation and Theory-Guided Systematic Causal Network Analysis Toward Integrative Pharmacovigilance Research , 2016, Current Pharmacology Reports.

[12]  Thanh Hai Dang,et al.  D3NER: biomedical named entity recognition using CRF‐biLSTM improved with fine‐tuned embeddings of various linguistic information , 2018, Bioinform..

[13]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[14]  Maria Liakata,et al.  Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances , 2018, J. Biomed. Informatics.

[15]  Robert S. Boyer,et al.  Automated Reasoning: Essays in Honor of Woody Bledsoe , 1991, Automated Reasoning.

[16]  Thomas Joseph,et al.  A pipeline to extract drug-adverse event pairs from multiple data sources , 2014, BMC Medical Informatics and Decision Making.

[17]  M. Girolami,et al.  Analysis of free text in electronic health records for identification of cancer patient trajectories , 2017, Scientific Reports.

[18]  Yongqun He,et al.  Extracting Adverse Drug Reactions using Deep Learning and Dictionary Based Approaches , 2017, TAC.

[19]  Naoaki Okazaki,et al.  Named entity recognition with multiple segment representations , 2013, Inf. Process. Manag..

[20]  Egon L. Willighagen,et al.  OSCAR4: a flexible architecture for chemical text-mining , 2011, J. Cheminformatics.

[21]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[22]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[23]  Jun Xu,et al.  Clinical Named Entity Recognition Using Deep Learning Models , 2017, AMIA.

[24]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[25]  International drug monitoring: the role of national centres. Report of a WHO meeting. , 1972, World Health Organization technical report series.

[26]  Robert S. Boyer,et al.  MJRTY: A Fast Majority Vote Algorithm , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.

[27]  Steven Bethard,et al.  UArizona at the MADE1.0 NLP Challenge , 2018, Medication and Adverse Drug Event Detection.

[28]  Xi Yang,et al.  Detecting Medications and Adverse Drug Events in Clinical Notes Using Recurrent Neural Networks , 2018, Medication and Adverse Drug Event Detection.

[29]  Fei Li,et al.  Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning , 2018, JMIR medical informatics.

[30]  Bharath Dandala,et al.  IBM Research System at TAC 2017: Adverse Drug Reactions Extraction from Drug Labels , 2017, TAC.

[31]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[32]  Elke A. Rundensteiner,et al.  Bidirectional LSTM-CRF for Adverse Drug Event Tagging in Electronic Health Records , 2018, Medication and Adverse Drug Event Detection.

[33]  Wil M. P. van der Aalst,et al.  Business Process Variability Modeling , 2017, ACM Comput. Surv..

[34]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[35]  Kirk Roberts,et al.  Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track , 2017, TAC.

[36]  Frédéric Precioso,et al.  Named Entity Recognition using Neural Networks for Clinical Notes , 2018, Medication and Adverse Drug Event Detection.

[37]  KumarVipin,et al.  Mining Electronic Health Records (EHRs) , 2018 .

[38]  Zina M. Ibrahim,et al.  ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records , 2017, PloS one.

[39]  Anne Cocos,et al.  Combining rule-based and neural network systems for extracting adverse reactions from drug labels , 2017, TAC.

[40]  Si Li,et al.  BUPT-PRIS System for TAC 2017 Event Nugget Detection, Event Argument Linking and ADR Tracks , 2017, TAC.