Supervised Machine Learning to Predict Follow-Up Among Adjuvant Endocrine Therapy Patients

Long-term adjuvant endocrine therapy patients often fail to follow-up with their care providers for the recommended duration of time. We used electronic health record data, tumor registry records, and appointment logs to predict follow-up for an adjuvant endocrine therapy patient cohort. Learning predictors for follow-up may facilitate interventions that improve follow-up rates, and ultimately improve patient care in the adjuvant endocrine therapy patient population.We selected 1455 adjuvant endocrine therapy patients at Vanderbilt University Medical Center, and modeled them as a matrix of medical-related, appointment-related, and demographic related features derived from EHR data. We built and optimized a random forest classifier and neural network to differentiate between patients that follow-up, or fail to follow-up, with their care provider for at least five years. We measured follow-up three different ways: thought appointments with any care providers, appointments with an oncologist, and adjuvant endocrine therapy medication records. Classifiers make predictions at the start of adjuvant endocrine therapy, and additionally use temporal subsets of data to learn the change in accuracy as patient data accrues.Our best model is a random forest classifier combining medical-related, appointment-related, and demographic-related features to achieve an AUC of 0.74. The most predictive features for follow-up in our random forest model are total medication counts, patient age, and median income for zip code. We suggest that reliable prediction for follow-up may be correlated with amount of care received at VUMC (i.e., VUMC primary care).This study achieved moderately accurate prediction for followup in adjuvant endocrine therapy patients from electronic health record data. Predicting follow-up can facilitate interventions for improving follow-up rates and improve patient care for adjuvant endocrine therapy cohorts. This study demonstrates the ability to find opportunities for patient care improvement from EHR data.

[1]  Jimeng Sun,et al.  Predicting changes in hypertension control using electronic health records from a chronic disease management program , 2014, J. Am. Medical Informatics Assoc..

[2]  Daniel Fabbri,et al.  Analysis of Adjuvant Endocrine Therapy in Practice From Electronic Health Record Data of Patients With Breast Cancer. , 2017, JCO clinical cancer informatics.

[3]  Lesley Fallowfield,et al.  Endocrine Therapy for Hormone Receptor-Positive Metastatic Breast Cancer: American Society of Clinical Oncology Guideline. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  Guan Wang,et al.  A method for systematic discovery of adverse drug events from clinical notes , 2015, J. Am. Medical Informatics Assoc..

[5]  M Baum,et al.  Results of the ATAC (Arimidex, Tamoxifen, Alone or in Combination) trial after completion of 5 years' adjuvant treatment for breast cancer , 2005, The Lancet.

[6]  Timothy L Lash,et al.  Predictors of tamoxifen discontinuation among older women with estrogen receptor-positive breast cancer. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Daniel Fabbri,et al.  Evaluating EHR Data Availability for Cohort Selection in Retrospective Studies , 2016, 2016 IEEE International Conference on Healthcare Informatics (ICHI).

[9]  E. Kleinberg An overtraining-resistant stochastic modeling method for pattern recognition , 1996 .

[10]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[11]  William G. Baxt,et al.  Use of an Artificial Neural Network for Data Analysis in Clinical Decision-Making: The Diagnosis of Acute Coronary Occlusion , 1990, Neural Computation.

[12]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Dawn L Hershman,et al.  Perfecting breast-cancer treatment--incremental gains and musculoskeletal pains. , 2015, The New England journal of medicine.

[15]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[16]  Amy P Abernethy,et al.  The financial toxicity of cancer treatment: a pilot study assessing out-of-pocket expenses and the insured cancer patient's experience. , 2013, The oncologist.

[17]  Terry L. Smith,et al.  Tamoxifen for early breast cancer: an overview of the randomised trials. Early Breast Cancer Trialists' Collaborative Group. , 1998, Lancet.

[18]  J. Ramon,et al.  Machine learning techniques to examine large patient databases. , 2009, Best practice & research. Clinical anaesthesiology.

[19]  Lesley Fallowfield,et al.  Acceptance of adjuvant therapy and quality of life issues. , 2005, Breast.

[20]  Mike Clarke,et al.  Tamoxifen for early breast cancer: an overview of the randomised trials , 1998, The Lancet.

[21]  Melissa Aczon,et al.  Dynamic Mortality Risk Predictions in Pediatric Critical Care Using Recurrent Neural Networks , 2017, ArXiv.