Applying machine learning to predict real-world individual treatment effects: insights from a virtual patient cohort

OBJECTIVE We aimed to investigate bias in applying machine learning to predict real-world individual treatment effects. MATERIALS AND METHODS Using a virtual patient cohort, we simulated real-world healthcare data and applied random forest and gradient boosting classifiers to develop prediction models. Treatment effect was estimated as the difference between the predicted outcomes of a treatment and a control. We evaluated the impact of predictors (ie, treatment predictors [X1], confounders [X2], treatment effects modifiers [X3], and other outcome risk factors [X4]) with known effects on treatment and outcome using real-world data, and outcome imbalance on predicting individual outcome. Using counterfactuals, we evaluated percentage of patients with biased predicted individual treatment effects. RESULTS The X4 had relatively more impact on model performance than X2 and X3 did. No effects were observed from X1. Moderate-to-severe outcome imbalance had a significantly negative impact on model performance, particularly among subgroups in which an outcome occurred. Bias in predicting individual treatment effects was significant and persisted even when the models had a 100% accuracy in predicting health outcome. DISCUSSION Inadequate inclusion of the X2, X3, and X4 and moderate-to-severe outcome imbalance may affect model performance in predicting individual outcome and subsequently bias in predicting individual treatment effects. Machine learning models with all features and high performance for predicting individual outcome still yielded biased individual treatment effects. CONCLUSIONS Direct application of machine learning might not adequately address bias in predicting individual treatment effects. Further method development is needed to advance machine learning to support individualized treatment selection.

[1]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[2]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[3]  I. Kohane,et al.  Big Data and Machine Learning in Health Care. , 2018, JAMA.

[4]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[5]  Albert T. Young,et al.  Development and Validation of an Electronic Health Record–Based Machine Learning Model to Estimate Delirium Risk in Newly Hospitalized Patients Without Known Cognitive Impairment , 2018, JAMA network open.

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  H. Cramér Mathematical Methods of Statistics (PMS-9), Volume 9 , 1946 .

[8]  Ziad Obermeyer,et al.  Development and Application of a Machine Learning Approach to Assess Short-term Mortality Risk Among Patients With Cancer Starting Chemotherapy , 2018, JAMA network open.

[9]  Euan A Ashley,et al.  The precision medicine initiative: a new national effort. , 2015, JAMA.

[10]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  T. Fahey Applying the results of clinical trials to patients to general practice: perceived problems, strengths, assumptions, and challenges for the future. , 1998, The British journal of general practice : the journal of the Royal College of General Practitioners.

[13]  Gang Fang,et al.  Practice of Epidemiology Apples and Oranges? Interpretations of Risk Adjustment and Instrumental Variable Estimates of Intended Treatment Effects Using Observational Data , 2011 .

[14]  Jennifer G. Robinson,et al.  Incidence of and Risk Factors for Severe Adverse Events in Elderly Patients Taking Angiotensin‐Converting Enzyme Inhibitors or Angiotensin II Receptor Blockers after an Acute Myocardial Infarction , 2018, Pharmacotherapy.

[15]  J. Brooks,et al.  Interpreting treatment-effect estimates with heterogeneity and choice: simulation model results. , 2009, Clinical therapeutics.

[16]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[17]  Olatz Arbelaitz,et al.  Applying Resampling Methods for Imbalanced Datasets to Not So Imbalanced Datasets , 2013, CAEPIA.

[18]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[19]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[20]  John Shawe-Taylor,et al.  Proceedings of the Sixteenth International Conference on Machine Learning (ICML99) , 1999 .

[21]  Richard L Kravitz,et al.  Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. , 2004, The Milbank quarterly.

[22]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[23]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[24]  M. Brookhart,et al.  Effectiveness and Safety of Dabigatran and Warfarin in Real‐World US Patients With Non‐Valvular Atrial Fibrillation: A Retrospective Cohort Study , 2015, Journal of the American Heart Association.

[25]  James J. Heckman,et al.  1. The Scientific Model of Causality , 2005 .

[26]  Ewout W Steyerberg,et al.  Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests , 2016, British Medical Journal.

[27]  William H Crown,et al.  Potential application of machine learning in health outcomes research and some statistical cautions. , 2015, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[28]  Jennifer G. Robinson,et al.  Changes in Statin Adherence Following an Acute Myocardial Infarction Among Older Adults: Patient Predictors and the Association With Follow‐Up With Primary Care Providers and/or Cardiologists , 2017, Journal of the American Heart Association.

[29]  M. Zarbin Challenges in Applying the Results of Clinical Trials to Clinical Practice. , 2016, JAMA ophthalmology.

[30]  David M Kent,et al.  Limitations of applying summary results of clinical trials to individual patients: the need for risk stratification. , 2007, JAMA.

[31]  Donald Rubin,et al.  Estimating Causal Effects from Large Data Sets Using Propensity Scores , 1997, Annals of Internal Medicine.

[32]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[33]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[34]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[35]  Jennifer G. Robinson,et al.  Adherence Tradeoff to Multiple Preventive Therapies and All-Cause Mortality After Acute Myocardial Infarction. , 2017, Journal of the American College of Cardiology.

[36]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[37]  Alison M Darcy,et al.  Machine Learning and the Profession of Medicine. , 2016, JAMA.

[38]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[39]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[40]  Douglas M. Hawkins,et al.  Assessing Model Fit by Cross-Validation , 2003, J. Chem. Inf. Comput. Sci..

[41]  P. Rothwell,et al.  Can overall results of clinical trials be applied to all patients? , 1995, The Lancet.

[42]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.