Adverse Drug Reaction Concept Normalization in Russian-Language Reviews of Internet Users

Mapping the pharmaceutically significant entities on natural language to standardized terms/concepts is a key task in the development of the systems for pharmacovigilance, marketing, and using drugs out of the application scope. This work estimates the accuracy of mapping adverse reaction mentions to the concepts from the Medical Dictionary of Regulatory Activity (MedDRA) in the case of adverse reactions extracted from the reviews on the use of pharmaceutical products by Russian-speaking Internet users (normalization task). The solution we propose is based on a neural network approach using two neural network models: the first one for encoding concepts, and the second one for encoding mentions. Both models are pre-trained language models, but the second one is additionally tuned for the normalization task using both the Russian Drug Reviews (RDRS) corpus and a set of open English-language corpora automatically translated into Russian. Additional tuning of the model during the proposed procedure increases the accuracy of mentions of adverse drug reactions by 3% on the RDRS corpus. The resulting accuracy for the adverse reaction mentions mapping to the preferred terms of MedDRA in RDRS is 70.9% F1-micro. The paper analyzes the factors that affect the accuracy of solving the task based on a comparison of the RDRS and the CSIRO Adverse Drug Event Corpus (CADEC) corpora. It is shown that the composition of the concepts of the MedDRA and the number of examples for each concept play a key role in the task solution. The proposed model shows a comparable accuracy of 87.5% F1-micro on a subsample of RDRS and CADEC datasets with the same set of MedDRA preferred terms.

[1]  E. Aramaki,et al.  Identification of hand-foot syndrome from cancer patients’ blog posts: BERT-based deep-learning approach to detect potential adverse drug reaction symptoms , 2022, PloS one.

[2]  A. Sboev,et al.  Extraction of the Relations among Significant Pharmacological Entities in Russian-Language Reviews of Internet Users on Medications , 2022, Big Data Cogn. Comput..

[3]  A. Sboev,et al.  Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models , 2022, Applied Sciences.

[4]  A. Sboev,et al.  Data-driven model for identifying related pharmaceutically-significant entities in clinical texts , 2022, INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS ICNAAM 2020.

[5]  Davy Weissenbacher,et al.  DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter , 2021, J. Am. Medical Informatics Assoc..

[6]  Martin Krallinger,et al.  Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021 , 2021, SMM4H.

[7]  Said Ouatik El Alaoui,et al.  MTTLADE: A multi-task transfer learning-based method for adverse drug events extraction , 2021, Inf. Process. Manag..

[8]  Andrew McCallum,et al.  Low resource recognition and linking of biomedical concepts from a large ontology , 2021, BCB.

[9]  Elena Tutubalina,et al.  The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews , 2020, Bioinform..

[10]  R. Rybka,et al.  A neural network algorithm for extracting pharmacological information from russian-language internet reviews on drugs , 2020, Journal of Physics: Conference Series.

[11]  Katikapalli Subramanyam Kalyan,et al.  Target Concept Guided Medical Concept Normalization in Noisy User-Generated Texts , 2020, DEELIO.

[12]  Robert-Jan Sips,et al.  Normalization of Long-tail Adverse Drug Reactions in Social Media , 2020, LOUHI.

[13]  S. Sangeetha,et al.  Medical Concept Normalization in User-Generated Texts by Learning Target Concept Embeddings , 2020, LOUHI.

[14]  H. Ebrahimpour-Komleh,et al.  Adverse Drug Reaction Detection in Social Media by Deep Learning Methods , 2019, Cell journal.

[15]  Xi Yang,et al.  Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boosting , 2019, J. Am. Medical Informatics Assoc..

[16]  Hua Xu,et al.  BERT-based Ranking for Biomedical Entity Normalization , 2019, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[17]  A. Magge,et al.  Overview of the Fifth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2020 , 2020, SMM4H.

[18]  Weiguo Fan,et al.  Adverse drug event detection and extraction from open data: A deep learning approach , 2020, Inf. Process. Manag..

[19]  Sangameshwar Patil,et al.  Medical Concept Normalization by Encoding Target Knowledge , 2020, ML4H@NeurIPS.

[20]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[21]  Sudeshna Sarkar,et al.  Medical Entity Linking using Triplet Network , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[22]  Mikhail Arkhipov,et al.  Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.

[23]  Mike Conway,et al.  The PsyTAR dataset: From patients generated narratives to a corpus of adverse drug events and effectiveness of psychiatric medications , 2019, Data in brief.

[24]  Michael J. Paul,et al.  Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019 , 2019, Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task.

[25]  Michael J. Paul,et al.  Overview of the Third Social Media Mining for Health (SMM4H) Shared Tasks at EMNLP 2018 , 2018, EMNLP 2018.

[26]  Berry de Bruijn,et al.  Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task , 2018, J. Am. Medical Informatics Assoc..

[27]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[28]  Chris Develder,et al.  Joint entity recognition and relation extraction as a multi-head selection problem , 2018, Expert Syst. Appl..

[29]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[30]  Alok N. Choudhary,et al.  Medical Concept Normalization for Online User-Generated Texts , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[31]  Kirk Roberts,et al.  Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track , 2017, TAC.

[32]  Abeed Sarker,et al.  Overview of the Second Social Media Mining for Health (SMM4H) Shared Tasks at AMIA 2017 , 2017, SMM4H@AMIA.

[33]  Norman Meuschke,et al.  news-please - A Generic News Crawler and Extractor , 2017, ISI.

[34]  Stefan M. Rüger,et al.  Adverse Drug Reaction Classification With Deep Neural Networks , 2016, COLING.

[35]  Nigel Collier,et al.  Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation , 2016, ACL.

[36]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[37]  Paloma Martínez,et al.  Pharmacovigilance through the development of text mining and natural language processing techniques , 2015, J. Biomed. Informatics.

[38]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[39]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Sarvnaz Karimi,et al.  Cadec: A corpus of adverse drug event annotations , 2015, J. Biomed. Informatics.

[41]  Information extraction from clinical texts in Russian , 2015 .

[42]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[43]  Andrew McCallum,et al.  A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[44]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[45]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..