Detecting adverse drug events with multiple representations of clinical measurements

Adverse drug events (ADEs) are grossly under-reported in electronic health records (EHRs). This could be mitigated by methods that are able to detect ADEs in EHRs, thereby allowing for missing ADE-specific diagnosis codes to be identified and added. A crucial aspect of constructing such systems is to find proper representations of the data in order to allow the predictive modeling to be as accurate as possible. One category of EHR data that can be used as indicators of ADEs are clinical measurements. However, using clinical measurements as features is not unproblematic due to the high rate of missing values and they can be repeated a variable number of times in each patient health record. In this study, five basic representations of clinical measurements are proposed and evaluated to handle these two problems. An empirical investigation using random forest on 27 datasets from a real EHR database with different ADE targets is presented, demonstrating that the predictive performance, in terms of accuracy and area under ROC curve, is higher when representing clinical measurements crudely as whether they were taken or how many times they were taken by a patient. Furthermore, a sixth alternative, combining all five basic representations, significantly outperforms using any of the basic representation except for one. A subsequent analysis of variable importance is also conducted with this fused feature set, showing that when clinical measurements have a high missing rate, the number of times they were taken by one patient is ranked as more informative than looking at their actual values. The observation from random forest is also confirmed empirically using other commonly employed classifiers. This study demonstrates that the way in which clinical measurements from EHRs are presented has a high impact for ADE detection, and that using multiple representations outperforms using a basic representation.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jürgen Stausberg,et al.  Drug-related admissions and hospital-acquired adverse drug events in Germany: a longitudinal analysis from 2003 to 2007 of ICD-10-coded routine data , 2011, BMC health services research.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  P. Barach,et al.  Clarifying Adverse Drug Events: A Clinician's Guide to Terminology, Documentation, and Reporting , 2004, Annals of Internal Medicine.

[5]  Sriraam Natarajan,et al.  Identifying Adverse Drug Events by Relational Learning , 2012, AAAI.

[6]  L. Kohn,et al.  To Err Is Human : Building a Safer Health System , 2007 .

[7]  Régis Beuscart,et al.  Data Mining to Generate Adverse Drug Events Detection Rules , 2011, IEEE Transactions on Information Technology in Biomedicine.

[8]  Bertram Pitt,et al.  Withdrawal of cerivastatin from the world market , 2001, Current controlled trials in cardiovascular medicine.

[9]  W. Inman,et al.  Under-reporting of adverse drug reactions. , 1985, British medical journal.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Miriam Sturkenboom,et al.  Postmarketing Safety Surveillance , 2013, Drug Safety.

[12]  S. Schroeder,et al.  How Many Hours Is Enough? An Old Profession Meets a New Generation , 2004, Annals of Internal Medicine.

[13]  Jing Zhao,et al.  Detecting Adverse Drug Events Using Concept Hierarchies of Clinical Codes , 2014, 2014 IEEE International Conference on Healthcare Informatics.

[14]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[15]  Jie Chen,et al.  Signaling Potential Adverse Drug Reactions from Administrative Health Databases , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Jing Zhao,et al.  Predicting Adverse Drug Events by Analyzing Electronic Patient Records , 2013, AIME.

[17]  Hercules Dalianis,et al.  Stockholm EPR Corpus : A Clinical Database Used to Improve Health Care , 2012 .

[18]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[19]  G. Niklas Norén,et al.  Opportunities and challenges of adverse drug reaction surveillance in electronic patient records , 2010 .

[20]  Maria Kvist,et al.  Exploration of Adverse Drug Reactions in Semantic Vector Space Models of Clinical Text , 2012, ICML 2012.

[21]  N. Shah,et al.  Pharmacovigilance Using Clinical Notes , 2013, Clinical pharmacology and therapeutics.

[22]  M. Pirmohamed,et al.  Which drugs cause preventable admissions to hospital? A systematic review. , 2007, British journal of clinical pharmacology.

[23]  S. Goldman,et al.  Limitations and strengths of spontaneous reports data. , 1998, Clinical therapeutics.

[24]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[25]  Barbara Sibbald,et al.  Rofecoxib (Vioxx) voluntarily withdrawn from market , 2004, Canadian Medical Association Journal.

[26]  Hye Jin Kam,et al.  A novel algorithm for detection of adverse drug reaction signals using a hospital electronic medical record database , 2011, Pharmacoepidemiology and drug safety.

[27]  Carol Friedman,et al.  Mining electronic health records for adverse drug effects using regression based methods , 2010, IHI.

[28]  Henrik Druid,et al.  Incidence of fatal adverse drug reactions: a population based study. , 2008, British journal of clinical pharmacology.