Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection

Adverse drug events (ADEs) are heavily under-reported in electronic health records (EHRs). Alerting systems that are able to detect potential ADEs on the basis of patient-specific EHR data would help to mitigate this problem. To that end, the use of machine learning has proven to be both efficient and effective; however, challenges remain in representing the heterogeneous EHR data, which moreover tends to be high-dimensional and exceedingly sparse, in a manner conducive to learning high-performing predictive models. Prior work has shown that distributional semantics - that is, natural language processing methods that, traditionally, model the meaning of words in semantic (vector) space on the basis of co-occurrence information - can be exploited to create effective representations of sequential EHR data of various kinds. When modeling data in semantic space, an important design decision concerns the size of the context window around an object of interest, which governs the scope of co-occurrence information that is taken into account and affects the composition of the resulting semantic space. Here, we report on experiments conducted on 27 clinical datasets, demonstrating that performance can be significantly improved by modeling EHR data in ensembles of semantic spaces, consisting of multiple semantic spaces built with different context window sizes. A follow-up investigation is conducted to study the impact on predictive performance as increasingly more semantic spaces are included in the ensemble, demonstrating that accuracy tends to improve with the number of semantic spaces, albeit not monotonically so. Finally, a number of different strategies for combining the semantic spaces are explored, demonstrating the advantage of early (feature) fusion over late (classifier) fusion. Semantic space ensembles allow multiple views of (sparse) data to be captured (densely) and thereby enable improved performance to be obtained on the task of detecting ADEs in EHRs.

[1]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[2]  Robert Östling,et al.  Stagger: an Open-Source Part of Speech Tagger for Swedish , 2013 .

[3]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[4]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[5]  P Ryan,et al.  Novel Data‐Mining Methodologies for Adverse Drug Event Discovery and Analysis , 2012, Clinical pharmacology and therapeutics.

[6]  J. Banks,et al.  Information Aggregation, Rationality, and the Condorcet Jury Theorem , 1996, American Political Science Review.

[7]  L. Hazell,et al.  Under-Reporting of Adverse Drug Reactions , 2006, Drug safety.

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Hercules Dalianis,et al.  Stockholm EPR Corpus : A Clinical Database Used to Improve Health Care , 2012 .

[10]  Stefan Evert,et al.  Contrasting Syntagmatic and Paradigmatic Relations: Insights from Distributional Semantic Models , 2014, *SEM@COLING.

[11]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[12]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[13]  Jing Zhao,et al.  Detecting Adverse Drug Events Using Concept Hierarchies of Clinical Codes , 2014, 2014 IEEE International Conference on Healthcare Informatics.

[14]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[15]  Stewart Kowalski,et al.  Generating features for named entity recognition by learning prototypes in semantic space: The case of de-identifying health records , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[16]  Jing Zhao,et al.  Modeling heterogeneous clinical sequence data in semantic space for adverse drug event detection , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Maria Skeppstedt,et al.  Synonym extraction and abbreviation expansion with ensembles of semantic spaces , 2014, Journal of Biomedical Semantics.

[19]  Jing Zhao,et al.  Cascading adverse drug event detection in electronic health records , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[20]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[21]  Stefan Evert,et al.  A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection , 2014, TACL.

[22]  Jürgen Stausberg,et al.  Drug-related admissions and hospital-acquired adverse drug events in Germany: a longitudinal analysis from 2003 to 2007 of ICD-10-coded routine data , 2011, BMC health services research.

[23]  Alan J. Forster,et al.  A systematic review to evaluate the accuracy of electronic adverse drug event detection , 2012, J. Am. Medical Informatics Assoc..

[24]  Barbara Sibbald,et al.  Rofecoxib (Vioxx) voluntarily withdrawn from market , 2004, Canadian Medical Association Journal.

[25]  Henrik Boström Feature vs. classifier fusion for predictive data mining a case study in pesticide classification , 2007, 2007 10th International Conference on Information Fusion.

[26]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[27]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[28]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[29]  Jing Zhao,et al.  Detecting adverse drug events with multiple representations of clinical measurements , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[30]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[31]  M. Pirmohamed,et al.  Which drugs cause preventable admissions to hospital? A systematic review. , 2007, British journal of clinical pharmacology.

[32]  Bertram Pitt,et al.  Withdrawal of cerivastatin from the world market , 2001, Current controlled trials in cardiovascular medicine.

[33]  Maria Kvist,et al.  Identifying adverse drug event information in clinical notes with distributional semantic representations of context , 2015, J. Biomed. Informatics.

[34]  Aron Henriksson,et al.  Semantic Spaces of Clinical Text : Leveraging Distributional Semantics for Natural Language Processing of Electronic Health Records , 2013 .

[35]  Aron Henriksson,et al.  Learning multiple distributed prototypes of semantic categories for named entity recognition , 2015, Int. J. Data Min. Bioinform..