Mining Medical Data to Develop Clinical Decision Making Tools in Hemodialysis

The main objective of this work is to develop and apply data mining methods for the prediction of patient outcome in nephrology care. Cardiovascular events have an incidence of 20% in the first year of hemodialysis (HD). Real data routinely collected during HD administration were extracted from the Fresenius Medical Care database EuCliD (39 independent variables) and used to develop a random forest predictive model for the forecast of cardiovascular events in the first year of HD treatment. Two feature selection methods were applied. Results of these models in an independent cohort of patients showed a significant predictive ability. Our better result was obtained with a random forest built on 6 variables only (AUC: 77.1% ± 2.9%; MCE: 31.6% ± 3.5%), identified by the variable importance out of bag (OOB) estimate.

[1]  Paola Zuccolotto,et al.  Variable Selection Using Random Forests , 2006 .

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Yan Liu,et al.  Medical data mining: insights from winning two competitions , 2010, Data Mining and Knowledge Discovery.

[4]  B. Hocher,et al.  Biomarkers for the prediction of mortality and morbidity in patients with renal replacement therapy. , 2011, Clinical laboratory.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[7]  Sergio Cerutti,et al.  Blood pressure variability and cardiovascular autonomic control during hemodialysis in peripheral vascular disease patients , 2012, Physiological measurement.

[8]  E. Ritz,et al.  Intestinal-Renal Syndrome: Mirage or Reality? , 2011, Blood Purification.

[9]  Shyam Visweswaran,et al.  Learning patient-specific predictive models from clinical data , 2010, J. Biomed. Informatics.

[10]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[11]  David M Kent,et al.  Predicting mortality in incident dialysis patients: an analysis of the United Kingdom Renal Registry. , 2011, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[12]  David C. Murray,et al.  Outcome and risk factors for left ventricular disorders in chronic uraemia. , 1996, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[13]  Neil Savage Better medicine through machine learning , 2012, CACM.

[14]  R. Luke Chronic renal failure--a vasculopathic state. , 1998, The New England journal of medicine.

[15]  Nitesh V. Chawla,et al.  Time to CARE: a collaborative engine for practical disease prediction , 2010, Data Mining and Knowledge Discovery.

[16]  R. Jofré,et al.  Interdialytic weight gain as a marker of blood pressure, nutrition, and survival in hemodialysis patients. , 2005, Kidney international. Supplement.

[17]  Peter Kotanko,et al.  Prediction of Mortality in the First Two Years of Hemodialysis: Results from a Validation Study , 2012, Blood Purification.

[18]  Uptal D. Patel,et al.  Decreased pulse pressure during hemodialysis is associated with improved 6-month outcomes. , 2009, Kidney international.

[19]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[20]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.