A method for controlling complex confounding effects in the detection of adverse drug reactions using electronic health records

OBJECTIVE Electronic health records (EHRs) contain information to detect adverse drug reactions (ADRs), as they contain comprehensive clinical information. A major challenge of using comprehensive information involves confounding. We propose a novel data-driven method to identify ADR signals accurately by adjusting for confounders. MATERIALS AND METHODS We focused on two serious ADRs, rhabdomyolysis and pancreatitis, and used information in 264,155 unique patient records. We identified an ADR using established criteria, selected potential confounders, and then used penalized logistic regressions to estimate confounder-adjusted ADR associations. A reference standard was created to evaluate and compare the precision of the proposed method and four others. RESULTS Precision was 83.3% for rhabdomyolysis and 60.8% for pancreatitis when using the proposed method, and we identified several drug safety signals that are interesting for further clinical review. DISCUSSION The proposed method effectively estimated ADR associations after adjusting for confounders. A main cause of error was probably due to the nature of the dataset in that a substantial number of patients had a single visit only and, therefore, it was not possible to determine correctly the appropriate sequence of events for them. It is likely that performance will be improved with use of EHR data that contain more longitudinal records. CONCLUSIONS This data-driven method is effective in controlling for confounding, resulting in either a higher or similar precision when compared with four comparators, has the unique ability to provide insight into confounders for each specific medication-ADR pair, and can be easily adapted to other EHR systems.

[1]  N. Laird,et al.  Incidence of Adverse Drug Events and Potential Adverse Drug Events: Implications for Prevention , 1995 .

[2]  R. Altman,et al.  Detecting Drug Interactions From Adverse‐Event Reports: Interaction Between Paroxetine and Pravastatin Increases Blood Glucose Levels , 2011, Clinical pharmacology and therapeutics.

[3]  A. Borobia,et al.  A Pharmacovigilance Program From Laboratory Signals for the Detection and Reporting of Serious Adverse Drug Reactions in Hospitalized Patients , 2010, Clinical pharmacology and therapeutics.

[4]  J. Tisdale,et al.  Drug-Induced Diseases: Prevention, Detection, and Management , 2005 .

[5]  N. Laird,et al.  Incidence of adverse drug events and potential adverse drug events , 1995 .

[6]  Cédrick Fairon,et al.  Annotation analysis for testing drug safety signals using unstructured clinical notes , 2012, J. Biomed. Semant..

[7]  J. Overhage,et al.  Advancing the Science for Active Surveillance: Rationale and Design for the Observational Medical Outcomes Partnership , 2010, Annals of Internal Medicine.

[8]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[9]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[10]  Xiaoyan Wang,et al.  Characterizing environmental and phenotypic associations using information theory and electronic health records , 2009, BMC Bioinformatics.

[11]  G. Upton Fisher's Exact Test , 1992 .

[12]  Robert W. Makuch,et al.  Detecting Rare Adverse Events in Postmarketing Studies: Sample Size Considerations , 2006 .

[13]  George Hripcsak,et al.  A statistical methodology for analyzing co-occurrence data from a large sample , 2007, J. Biomed. Informatics.

[14]  H. Morgenstern,et al.  Confounding in health research. , 2001, Annual review of public health.

[15]  Uwe Siebert,et al.  Good research practices for comparative effectiveness research: approaches to mitigate bias and confounding in the design of nonrandomized studies of treatment effects using secondary data sources: the International Society for Pharmacoeconomics and Outcomes Research Good Research Practices for Retr , 2009, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[16]  George Hripcsak,et al.  Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[17]  D. Madigan,et al.  Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership , 2012, Statistics in medicine.

[18]  A. Fourrier-Réglat,et al.  The EU-ADR project: preliminary results and perspective. , 2009, Studies in health technology and informatics.

[19]  Carol Friedman,et al.  Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions , 2013, J. Am. Medical Informatics Assoc..

[20]  R. Altman,et al.  Data-Driven Prediction of Drug Effects and Interactions , 2012, Science Translational Medicine.

[21]  M. Schuemie,et al.  Combining electronic healthcare databases in Europe to allow for large‐scale drug safety monitoring: the EU‐ADR Project , 2011, Pharmacoepidemiology and drug safety.

[22]  David Madigan,et al.  Large-scale regression-based pattern discovery: The example of screening the WHO global drug safety database , 2010 .

[23]  H. Popper,et al.  Etiology of acute pancreatitis , 1942 .

[24]  P Ryan,et al.  Novel Data‐Mining Methodologies for Adverse Drug Event Discovery and Analysis , 2012, Clinical pharmacology and therapeutics.

[25]  Jianjun Li,et al.  Drug-induced acute pancreatitis: an evidence-based review. , 2007, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[26]  Carol Friedman,et al.  Mining electronic health records for adverse drug effects using regression based methods , 2010, IHI.

[27]  D. Madigan,et al.  The role of data mining in pharmacovigilance , 2005, Expert opinion on drug safety.

[28]  D. Bates,et al.  The Costs of Adverse Drug Events in Hospitalized Patients , 1997 .

[29]  A. Bate,et al.  Quantitative signal detection using spontaneous ADR reporting , 2009, Pharmacoepidemiology and drug safety.

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[32]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[33]  Ola Caster,et al.  Mining the WHO Drug Safety Database Using Lasso Logistic Regression , 2007 .

[34]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[35]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[36]  Elinor Miller,et al.  Relationship of ethnic origin, gender, and age to blood creatine kinase levels. , 2009, The American journal of medicine.

[37]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[38]  J. Rassen,et al.  Confounding Control in Healthcare Database Research: Challenges and Potential Approaches , 2010, Medical care.

[39]  C. Gouriéroux,et al.  Likelihood Ratio Test, Wald Test, and Kuhn-Tucker Test in Linear Models with Inequality Constraints on the Regression Parameters , 1982 .

[40]  C. Friedman,et al.  Detection of Pharmacovigilance‐Related Adverse Events Using Electronic Health Records and Automated Methods , 2012, Clinical pharmacology and therapeutics.

[41]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[42]  J. Avorn,et al.  A review of uses of health care utilization databases for epidemiologic research on therapeutics. , 2005, Journal of clinical epidemiology.

[43]  David Madigan,et al.  Large‐scale regression‐based pattern discovery: The example of screening the WHO global drug safety database , 2010, Stat. Anal. Data Min..

[44]  R. Crosby,et al.  Observations on increased CPK levels in "asymptomatic" cocaine abusers. , 1992, Journal of addictive diseases.

[45]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.