Cascading adverse drug event detection in electronic health records

The ability to detect adverse drug events (ADEs) in electronic health records (EHRs) is useful in many medical applications, such as alerting systems that indicate when an ADE-specific diagnosis code should be assigned. Automating the detection of ADEs can be attempted by applying machine learning to existing, labeled EHR data. How to do this in an effective manner is, however, an open question. The issues addressed in this study concern the granularity of the classification task: (1) If we wish to predict the occurrence of any ADE, is it advantageous to conflate the various ADE class labels prior to learning, or should they be merged post prediction? (2) If we wish to predict a family of ADEs or even a specific ADE, can the predictive performance be enhanced by dividing the classification task into a cascading scheme: predicting first, on a coarse level, whether there is an ADE or not, and, in the former case, followed by a more specific prediction on which family the ADE belongs to, and then finally a prediction on the specific ADE within that particular family? In this study, we conduct a series of experiments using a real, clinical dataset comprising healthcare episodes that have been assigned one of eight ADE-related diagnosis codes and a set of randomly extracted episodes that have not been assigned any ADE code. It is shown that, when distinguishing between ADEs and non-ADEs, merging the various ADE labels prior to learning leads to significantly higher predictive performance in terms of accuracy and area under ROC curve. A cascade of random forests is moreover constructed to determine either the family of ADEs or the specific class label; here, the performance is indeed enhanced compared to directly employing a one-step prediction. This study concludes that, if predictive performance is of primary importance, the cascading scheme should be the recommended approach over employing a one-step prediction for detecting ADEs in EHRs.

[1]  Jing Zhao,et al.  Detecting adverse drug events with multiple representations of clinical measurements , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[3]  L. Härmark,et al.  Pharmacovigilance: methods, recent developments and future perspectives , 2008, European Journal of Clinical Pharmacology.

[4]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[5]  Jing Zhao,et al.  Modeling heterogeneous clinical sequence data in semantic space for adverse drug event detection , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[6]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[7]  Bertram Pitt,et al.  Withdrawal of cerivastatin from the world market , 2001, Current controlled trials in cardiovascular medicine.

[8]  Pernille Warrer,et al.  Using text-mining techniques in electronic patient records to identify ADRs from medicine use. , 2012, British Journal of Clinical Pharmacology.

[9]  Maria Kvist,et al.  Identifying adverse drug event information in clinical notes with distributional semantic representations of context , 2015, J. Biomed. Informatics.

[10]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[11]  R S Evans,et al.  Computerized surveillance of adverse drug events in hospital patients* , 1991, Quality and Safety in Health Care.

[12]  Ethem Alpaydin,et al.  MultiStage Cascading of Multiple Classifiers: One Man's Noise is Another Man's Data , 2000, ICML.

[13]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[14]  Jing Zhao,et al.  Predicting Adverse Drug Events by Analyzing Electronic Patient Records , 2013, AIME.

[15]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[16]  Hercules Dalianis,et al.  Stockholm EPR Corpus : A Clinical Database Used to Improve Health Care , 2012 .

[17]  Barbara Sibbald,et al.  Rofecoxib (Vioxx) voluntarily withdrawn from market , 2004, Canadian Medical Association Journal.

[18]  Ryusuke Miyamoto,et al.  Cascade Classifier Using Divided CoHOG Features for Rapid Pedestrian Detection , 2009, ICVS.

[19]  Henrik Druid,et al.  Incidence of fatal adverse drug reactions: a population based study. , 2008, British journal of clinical pharmacology.

[20]  Carol Friedman,et al.  Mining electronic health records for adverse drug effects using regression based methods , 2010, IHI.

[21]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[22]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[23]  M. Pirmohamed,et al.  Which drugs cause preventable admissions to hospital? A systematic review. , 2007, British journal of clinical pharmacology.

[24]  S. Schroeder,et al.  How Many Hours Is Enough? An Old Profession Meets a New Generation , 2004, Annals of Internal Medicine.

[25]  Jing Zhao,et al.  Detecting Adverse Drug Events Using Concept Hierarchies of Clinical Codes , 2014, 2014 IEEE International Conference on Healthcare Informatics.

[26]  M. Schuemie,et al.  Combining electronic healthcare databases in Europe to allow for large‐scale drug safety monitoring: the EU‐ADR Project , 2011, Pharmacoepidemiology and drug safety.

[27]  Michael A. Labuzetta,et al.  Secondary use of electronic health record data: spontaneous triggered adverse drug event reporting , 2010, Pharmacoepidemiology and drug safety.

[28]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[29]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[30]  S. Goldman,et al.  Limitations and strengths of spontaneous reports data. , 1998, Clinical therapeutics.

[31]  R. Raschke,et al.  A computer alert system to prevent injury from adverse drug events: development and evaluation in a community teaching hospital. , 1998, JAMA.

[32]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[33]  Hegler Tissot,et al.  An Evolutionary Approach To Cascade Multiple Classifiers: A Case-Study To Analyze Textual Content Of Medical Records And Identify Potential Diagnosis , 2014 .

[34]  P. Barach,et al.  Clarifying Adverse Drug Events: A Clinician's Guide to Terminology, Documentation, and Reporting , 2004, Annals of Internal Medicine.

[35]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[36]  Ethem Alpaydin,et al.  Cascading classifiers , 1998, Kybernetika.

[37]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[38]  Jürgen Stausberg,et al.  Drug-related admissions and hospital-acquired adverse drug events in Germany: a longitudinal analysis from 2003 to 2007 of ICD-10-coded routine data , 2011, BMC health services research.

[39]  Bailing Zhang,et al.  Reliable Classification of Vehicle Types Based on Cascade Classifier Ensembles , 2013, IEEE Transactions on Intelligent Transportation Systems.

[40]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  P. Loy International Classification of Diseases--9th revision. , 1978, Medical record and health care information journal.

[43]  Cristiano Premebida,et al.  A cascade classifier applied in pedestrian detection using laser and image-based features , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[44]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .