Language‐agnostic pharmacovigilant text mining to elicit side effects from clinical notes and hospital medication records

We sought to craft a drug safety signalling pipeline associating latent information in clinical free text with exposures to single drugs and drug pairs. Data arose from 12 secondary and tertiary public hospitals in two Danish regions, comprising approximately half the Danish population. Notes were operationalised with a fastText embedding, based on which we trained 10 270 neural‐network models (one for each distinct single‐drug/drug‐pair exposure) predicting the risk of exposure given an embedding vector. We included 2 905 251 admissions between May 2008 and June 2016, with 13 740 564 distinct drug prescriptions; the median number of prescriptions was 5 (IQR: 3–9) and in 1 184 340 (41%) admissions patients used ≥5 drugs concomitantly. A total of 10 788 259 clinical notes were included, with 179 441 739 tokens retained after pruning. Of 345 single‐drug signals reviewed, 28 (8.1%) represented possibly undescribed relationships; 186 (54%) signals were clinically meaningful. Sixteen (14%) of the 115 drug‐pair signals were possible interactions, and two (1.7%) were known. In conclusion, we built a language‐agnostic pipeline for mining associations between free‐text information and medication exposure without manual curation, predicting not the likely outcome of a range of exposures but also the likely exposures for outcomes of interest. Our approach may help overcome limitations of text mining methods relying on curated data in English and can help leverage non‐English free text for pharmacovigilance.

[1]  Changiz Eslahchi,et al.  A neural network-based method for polypharmacy side effects prediction , 2021, BMC Bioinform..

[2]  B. De Baets,et al.  Cold-Start Problems in Data-Driven Prediction of Drug–Drug Interaction Effects , 2021, Pharmaceuticals.

[3]  G. Vighi,et al.  Risk of Hospitalization Associated with Cardiovascular Medications in the Elderly Italian Population: A Nationwide Multicenter Study in Emergency Departments , 2021, Frontiers in Pharmacology.

[4]  T. Werge,et al.  Effect of Routine Cytochrome P450 2D6 and 2C19 Genotyping on Antipsychotic Drug Persistence in Patients With Schizophrenia: A Randomized Clinical Trial. , 2020, JAMA network open.

[5]  Eiji Aramaki,et al.  Identification of Adverse Drug Event–Related Japanese Articles: Natural Language Processing Analysis , 2020, JMIR medical informatics.

[6]  L. Kessing,et al.  Antihypertensive Drugs and Risk of Depression , 2020, Hypertension.

[7]  G. N. Norén,et al.  A Feasibility Study of Drug–Drug Interaction Signal Detection in Regular Pharmacovigilance , 2020, Drug Safety.

[8]  C. Correll,et al.  Clinical validation of the self-reported Glasgow Antipsychotic Side-effect Scale using the clinician-rated UKU side-effect scale as gold standard reference , 2020, Journal of psychopharmacology.

[9]  G. Vighi,et al.  Risk of hospitalisation associated with benzodiazepines and z-drugs in Italy: a nationwide multicentre study in emergency departments , 2020, Internal and Emergency Medicine.

[10]  G. Vighi,et al.  Italian Emergency Department Visits and Hospitalizations for Outpatients’ Adverse Drug Events: 12-Year Active Pharmacovigilance Surveillance (The MEREAFaPS Study) , 2020, Frontiers in Pharmacology.

[11]  Sophia Ananiadou,et al.  Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods , 2019, J. Am. Medical Informatics Assoc..

[12]  L. Gattepaille How Far Can We Go with Just Out-of-the-box BERT Models? , 2020, SMM4H.

[13]  Guy Divita,et al.  A Proficient Spelling Analysis Method Applied to a Pharmacovigilance Task , 2019, MedInfo.

[14]  S. Tcherny-Lessenot,et al.  Comparison of text processing methods in social media–based signal detection , 2019, Pharmacoepidemiology and drug safety.

[15]  H. Zeilhofer,et al.  Social Media Surveillance of Multiple Sclerosis Medications Used During Pregnancy and Breastfeeding: Content Analysis , 2019, Journal of medical Internet research.

[16]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[17]  Keyuan Jiang,et al.  Prediction of Personal Experience Tweets of Medication Use via Contextual Word Representations* , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[18]  Russ B. Altman,et al.  RedMed: Extending drug lexicons for social media applications , 2019, bioRxiv.

[19]  N. Shah,et al.  Early Detection of Adverse Drug Reactions in Social Health Networks: A Natural Language Processing Pipeline for Signal Detection , 2019, JMIR public health and surveillance.

[20]  Shoko Wakamiya,et al.  Extraction and Standardization of Patient Complaints from Electronic Medication Histories for Pharmacovigilance: Natural Language Processing Analysis in Japanese , 2018, JMIR medical informatics.

[21]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[22]  J. Rosenberg,et al.  [Health-related register-based research in Denmark]. , 2018, Ugeskrift for laeger.

[23]  A. Patrignani,et al.  [Under-reporting of adverse drug reactions, a problem that also involves medicines subject to additional monitoring. Preliminary data from a single-center experience on novel oral anticoagulants]. , 2018, Giornale italiano di cardiologia.

[24]  Gillian E. Caughey,et al.  What is polypharmacy? A systematic review of definitions , 2017, BMC Geriatrics.

[25]  G. N. Norén,et al.  A method for data‐driven exploration to pinpoint key features in medical data and facilitate expert review , 2017, Pharmacoepidemiology and drug safety.

[26]  W. DuMouchel,et al.  Reverse translation of adverse event reports paves the way for de-risking preclinical off-targets , 2017, eLife.

[27]  Anne Cocos,et al.  Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts , 2017, J. Am. Medical Informatics Assoc..

[28]  Justin Starren,et al.  Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review , 2017, Drug Safety.

[29]  Anita Burgun,et al.  Filtering Entities to Optimize Identification of Adverse Drug Reaction From Social Media: How Can the Number of Words Between Entities in the Messages Help? , 2017, JMIR public health and surveillance.

[30]  Yusuke Miyao,et al.  TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations , 2017, JMIR public health and surveillance.

[31]  Kotonari Aoki,et al.  Analysis of Patient Narratives in Disease Blogs on the Internet: An Exploratory Study of Social Pharmacovigilance , 2017, JMIR public health and surveillance.

[32]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[33]  Yoav Goldberg Neural Network Methods for Natural Language Processing , 2017 .

[34]  Anita Burgun-Parenthoine,et al.  Mining Adverse Drug Reactions in Social Media with Named Entity Recognition and Semantic Methods , 2017, MedInfo.

[35]  Abeed Sarker,et al.  Hybrid Semantic Analysis for Mapping Adverse Drug Reaction Mentions in Tweets to Medical Terminology , 2017, AMIA.

[36]  Nikhil Ketkar,et al.  Deep Learning with Python , 2017 .

[37]  Rahul Singh,et al.  Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams , 2016, BMC Bioinformatics.

[38]  Sophia Ananiadou,et al.  Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts , 2016, J. Biomed. Informatics.

[39]  Jing Liu,et al.  An ensemble method for extracting adverse drug events from social media , 2016, Artif. Intell. Medicine.

[40]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[41]  Frank E. Harrell,et al.  Prediction models need appropriate internal, internal-external, and external validation. , 2016, Journal of clinical epidemiology.

[42]  Christoph Lofi,et al.  Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use , 2015, J. Biomed. Informatics.

[43]  Paloma Martínez,et al.  Pharmacovigilance through the development of text mining and natural language processing techniques , 2015, J. Biomed. Informatics.

[44]  Sigrun Alba Johannesdottir Schmidt,et al.  The Danish National Patient Registry: a review of content, data quality, and research potential , 2015, Clinical epidemiology.

[45]  David Moher,et al.  The REporting of Studies Conducted Using Observational Routinely-Collected Health Data (RECORD) Statement: Methods for Arriving at Consensus and Developing Reporting Guidelines , 2015, PloS one.

[46]  J. Faillie Indication bias or protopathic bias? , 2015, British journal of clinical pharmacology.

[47]  Koldo Gojenola,et al.  On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions , 2015, J. Biomed. Informatics.

[48]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[49]  Pablo Carbonell,et al.  Exploring Brand-Name Drug Mentions on Twitter for Pharmacovigilance , 2015, MIE.

[50]  Yvonne Vergouwe,et al.  Towards better clinical prediction models: seven steps for development and an ABCD for validation. , 2014, European heart journal.

[51]  Hong Yu,et al.  Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration’s Adverse Event Reporting System Narratives , 2014, JMIR medical informatics.

[52]  Søren Brunak,et al.  Negation scope and spelling variation for text-mining of Danish electronic patient records , 2014, Louhi@EACL.

[53]  T. Werge,et al.  Dose-Specific Adverse Drug Reaction Identification in Electronic Patient Records: Temporal Data Mining in an Inpatient Psychiatric Population , 2014, Drug Safety.

[54]  Nigam H. Shah,et al.  Mining clinical text for signals of adverse drug-drug interactions , 2014, J. Am. Medical Informatics Assoc..

[55]  Robert Eriksson,et al.  Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text , 2013, J. Am. Medical Informatics Assoc..

[56]  N. Shah,et al.  Learning Signals of Adverse Drug-Drug Interactions from the Unstructured Text of Electronic Health Records , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[57]  Fan Yu,et al.  Towards large-scale twitter mining for drug-related adverse events , 2012, SHB '12.

[58]  I. Ralph Edwards,et al.  An agenda for UK clinical pharmacology: Pharmacovigilance , 2012 .

[59]  Pernille Warrer,et al.  Using text-mining techniques in electronic patient records to identify ADRs from medicine use. , 2012, British Journal of Clinical Pharmacology.

[60]  R. Altman,et al.  Data-Driven Prediction of Drug Effects and Interactions , 2012, Science Translational Medicine.

[61]  C. Richards,et al.  Emergency hospitalizations for adverse drug events in older Americans. , 2011, The New England journal of medicine.

[62]  P. Rochon,et al.  Antidepressants for agitation and psychosis in dementia. , 2011, The Cochrane database of systematic reviews.

[63]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[64]  A. Pariente,et al.  Data mining on electronic health record databases for signal detection in pharmacovigilance: which events to monitor? , 2009, Pharmacoepidemiology and drug safety.

[65]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[66]  J. Wooten,et al.  Reporting adverse drug reactions. , 2009, Southern medical journal.

[67]  M. Lindquist VigiBase, the WHO Global ICSR Database System: Basic Facts , 2008 .

[68]  R. Sundberg,et al.  A statistical methodology for drug–drug interaction surveillance , 2008, Statistics in medicine.

[69]  L. Aagaard,et al.  [The national drug interactions database]. , 2005, Ugeskrift for laeger.

[70]  P. Barach,et al.  Clarifying Adverse Drug Events: A Clinician's Guide to Terminology, Documentation, and Reporting , 2004, Annals of Internal Medicine.

[71]  I. Edwards,et al.  Adverse drug reactions: definitions, diagnosis, and management , 2000, The Lancet.

[72]  B. Bégaud,et al.  Under-reporting of adverse drug reactions Estimate based on a spontaneous reporting scheme and a sentinel system , 1998, European Journal of Clinical Pharmacology.

[73]  F. Haramburu,et al.  Under-reporting of adverse drug reactions in general practice. , 2003, British journal of clinical pharmacology.

[74]  John W. Auer,et al.  Linear algebra with applications , 1996 .

[75]  U. Ahlfors,et al.  The UKU side effect rating scale: A new comprehensive rating scale for psychotropic drugs and a cross‐sectional study of side effects in neuroleptic‐treated patients , 1987, Acta psychiatrica Scandinavica. Supplementum.

[76]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .