Variable Selection for Chronic Disease Outcome Prediction Using a Causal Inference Technique: A Preliminary Study

The ability to predict health outcomes of patients with chronic conditions has the potential for early risk factor identification, better treatment planning, and shared decision making. Compared to prediction tasks for acute conditions, modeling chronic diseases require careful adjustment for time-dependencies among treatments and responses, as well as variable selection to identify significant predictors. In this paper, targeting outcome prediction for chronic conditions which often require multiple medications, we applied causal inference techniques, specifically, the g-computation formula and marginal structural model, for the purpose of input variable selection prior to prediction using Bayesian networks. We propose that this approach allows for interpretable variable selection that also leads to better outcome prediction. An evaluation was performed using electronic health record data of a cohort of chronic kidney disease (CKD) patients to predict CKD progression. We identified effects of individual and concurrently used drugs on patients' kidney functions that are different across CKD stages. Lastly, using proposed variation selection technique, we predicted CKD progression with accuracy as high as 0.74, slightly outperforming logistic regression.

[1]  A. Kengne,et al.  Risk Models to Predict Chronic Kidney Disease and Its Progression: A Systematic Review , 2012, PLoS medicine.

[2]  Amit X Garg,et al.  Proton pump inhibitors and the risk of acute kidney injury in older patients: a population-based cohort study. , 2015, CMAJ open.

[3]  Sushrut S Waikar,et al.  KDOQI US commentary on the 2012 KDIGO clinical practice guideline for acute kidney injury. , 2013, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[4]  N. Tangri,et al.  A predictive model for progression of chronic kidney disease to kidney failure. , 2011, JAMA.

[5]  Gary S Collins,et al.  A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods. , 2013, Journal of clinical epidemiology.

[6]  G. M. Harper,et al.  Update on the management of chronic kidney disease. , 2012, American family physician.

[7]  Jessica G. Young,et al.  The parametric g‐formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death , 2012, Statistics in medicine.

[8]  L. Fisher,et al.  Time-dependent covariates in the Cox proportional-hazards regression model. , 1999, Annual review of public health.

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  C. Schmid,et al.  A new equation to estimate glomerular filtration rate. , 2009, Annals of internal medicine.

[11]  Josef Coresh,et al.  Chronic kidney disease , 2012, The Lancet.

[12]  R. Foley,et al.  United States Renal Data System public health surveillance of chronic kidney disease and end-stage renal disease , 2015, Kidney international supplements.

[13]  Mark J van der Laan,et al.  Targeted Maximum Likelihood Estimation of Natural Direct Effects , 2012, The international journal of biostatistics.

[14]  B L De Stavola,et al.  Methods for dealing with time‐dependent confounding , 2013, Statistics in medicine.

[15]  J. Robins,et al.  A Structural Approach to Selection Bias , 2004, Epidemiology.

[16]  V. Manninen,et al.  Effects of hypertension and dyslipidemia on the decline in renal function. , 1995, Hypertension.

[17]  Josef Coresh,et al.  Proton Pump Inhibitor Use and the Risk of Chronic Kidney Disease. , 2016, JAMA internal medicine.

[18]  R. Agarwal,et al.  The Role of Statins in Chronic Kidney Disease , 2005, The American journal of the medical sciences.

[19]  M. Elisaf,et al.  The Role of Statins in Chronic Kidney Disease , 2011, American Journal of Nephrology.

[20]  Michael J. Schull,et al.  Prediction of Heart Failure Mortality in Emergent Care , 2012, Annals of Internal Medicine.

[21]  Rema Padman,et al.  Data-driven clinical and cost pathways for chronic care delivery. , 2016, The American journal of managed care.

[22]  Adler J. Perotte,et al.  Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis , 2015, J. Am. Medical Informatics Assoc..

[23]  Rema Padman,et al.  Innovations in chronic care delivery using data-driven clinical pathways. , 2015, The American journal of managed care.

[24]  K. Kalantar-Zadeh,et al.  Observational studies versus randomized controlled trials: avenues to causal inference in nephrology. , 2012, Advances in chronic kidney disease.

[25]  J. Robins,et al.  Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. , 2009, International journal of epidemiology.

[26]  Victor M. Montori,et al.  Minimally Disruptive Medicine: A Pragmatically Comprehensive Model for Delivering Care to Patients with Multiple Chronic Conditions , 2015, Healthcare.

[27]  B N Prichard,et al.  Adverse reactions to diuretics. , 1992, European heart journal.