Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels

Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected events, which are a major safety concern. Drug labels, or prescribing information or package inserts, describe ADRs. Therefore, systematically identifying ADR information from drug labels is critical in multiple aspects; however, this task is challenging due to the nature of the natural language of drug labels. In this paper, we present a machine learning- and rule-based system for the identification of ADR entity mentions in the text of drug labels and their normalization through the Medical Dictionary for Regulatory Activities (MedDRA) dictionary. The machine learning approach is based on a recently proposed deep learning architecture, which integrates bi-directional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), and Conditional Random Fields (CRF) for entity recognition. The rule-based approach, used for normalizing the identified ADR mentions to MedDRA terms, is based on an extension of our in-house text-mining system, SciMiner. We evaluated our system on the Text Analysis Conference (TAC) Adverse Drug Reaction 2017 challenge test data set, consisting of 200 manually curated US FDA drug labels. Our ML-based system achieved 77.0% F1 score on the task of ADR mention recognition and 82.6% micro-averaged F1 score on the task of ADR normalization, while rule-based system achieved 67.4 and 77.6% F1 scores, respectively. Our study demonstrates that a system composed of a deep learning architecture for entity recognition and a rule-based model for entity normalization is a promising approach for ADR extraction from drug labels.

[1]  Jun Xu,et al.  UTH_CCB System for Adverse Drug Reaction Extraction from Drug Labels at TAC-ADR 2017 , 2017, TAC.

[2]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[3]  Yongqun He,et al.  Ontology-based literature mining and class effect analysis of adverse drug reactions associated with neuropathy-inducing drugs , 2018, Journal of Biomedical Semantics.

[4]  Yongqun He,et al.  Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network , 2011, BMC Immunology.

[5]  E. Brown,et al.  The Medical Dictionary for Regulatory Activities (MedDRA) , 1999, Drug safety.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[8]  Cécile Paris,et al.  Text and Data Mining Techniques in Adverse Drug Reaction Detection , 2015, ACM Comput. Surv..

[9]  Prakash M. Nadkarni,et al.  Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study , 2010, BMC Medical Informatics Decis. Mak..

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[12]  Azadeh Nikfarjam,et al.  Pattern mining for extraction of mentions of Adverse Drug Reactions from user comments. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[13]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[16]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[17]  Cui Tao,et al.  OAE: The Ontology of Adverse Events , 2014, J. Biomed. Semant..

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Bharath Dandala,et al.  IBM Research System at TAC 2017: Adverse Drug Reactions Extraction from Drug Labels , 2017, TAC.

[20]  R. M. Mehta The importance of pharmacovigilance , 2017 .

[21]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[22]  Yongqun He,et al.  Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions , 2015, Journal of Biomedical Semantics.

[23]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[24]  Jian Yang,et al.  Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks , 2010, BioNLP@ACL.

[25]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[26]  Dina Demner-Fushman,et al.  A dataset of 200 structured product labels annotated for adverse drug reactions , 2018, Scientific Data.

[27]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[28]  Ani Nenkova,et al.  Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2016, NAACL 2016.

[29]  Adam D. Schuyler,et al.  SciMiner: web-based literature mining tool for target identification and functional enrichment analysis , 2009, Bioinform..

[30]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[31]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[32]  Sampo Pyysalo,et al.  How to Train good Word Embeddings for Biomedical NLP , 2016, BioNLP@ACL.

[33]  Mph Dr. Syed Rizwanuddin Ahmad MD Adverse drug event monitoring at the food and drug administration , 2007, Journal of General Internal Medicine.

[34]  Kirk Roberts,et al.  Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track , 2017, TAC.

[35]  Yen S. Low,et al.  Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art , 2014, Drug Safety.

[36]  Yongqun He,et al.  Ontology-based literature mining of E. coli vaccine-associated gene interaction networks , 2017, J. Biomed. Semant..

[37]  Rebecca Racz,et al.  Ontology-based collection, representation and analysis of drug-associated neuropathy adverse events , 2016, Journal of Biomedical Semantics.