Machine Learning Models to Predict Kidney Stone Recurrence Using 24 Hour Urine Testing and Electronic Health Record-Derived Features

Abstract Objective To assess the accuracy of machine learning models in predicting kidney stone recurrence using variables extracted from the electronic health record (EHR). Methods We trained three separate machine learning (ML) models (least absolute shrinkage and selection operator regression [LASSO], random forest [RF], and gradient boosted decision tree [XGBoost] to predict 2-year and 5-year symptomatic kidney stone recurrence from electronic health-record (EHR) derived features and 24H urine data (n = 1231). ML models were compared to logistic regression [LR]. A manual, retrospective review was performed to evaluate for a symptomatic stone event, defined as pain, acute kidney injury or recurrent infections attributed to a kidney stone identified in the clinic or the emergency department, or for any stone requiring surgical treatment. We evaluated performance using area under the receiver operating curve (AUC-ROC) and identified important features for each model. Results The 2- and 5- year symptomatic stone recurrence rates were 25% and 31%, respectively. The LASSO model performed best for symptomatic stone recurrence prediction (2-yr AUC: 0.62, 5-yr AUC: 0.63). Other models demonstrated modest overall performance at 2- and 5-years: LR (0.585, 0.618), RF (0.570, 0.608), and XGBoost (0.580, 0.621). Patient age was the only feature in the top 5 features of every model. Additionally, the LASSO model prioritized BMI and history of gout for prediction. Conclusions Throughout our cohorts, ML models demonstrated comparable results to that of LR, with the LASSO model outperforming all other models. Further model testing should evaluate the utility of 24H urine features in model structure.

[1]  J. Capra,et al.  Machine Learning Models to Predict 24 Hour Urinary Abnormalities for Kidney Stone Disease. , 2022, Urology.

[2]  J. Capra,et al.  Machine Learning Prediction of Kidney Stone Composition Using Electronic Health Record-Derived Features. , 2021, Journal of endourology.

[3]  Paul A. Harris,et al.  The REDCap consortium: Building an international community of software platform partners , 2019, J. Biomed. Informatics.

[4]  S. Nakada,et al.  External Validation of the Recurrence of Kidney Stone Nomogram in a Surgical Cohort. , 2019, Journal of endourology.

[5]  K. Ngiam,et al.  Big data and machine learning algorithms for health-care delivery. , 2019, The Lancet. Oncology.

[6]  Lisa E. Vaughan,et al.  Predictors of Symptomatic Kidney Stone Recurrence After the First and Subsequent Episodes , 2019, Mayo Clinic proceedings.

[7]  E. Riboli,et al.  Body fatness, diabetes, physical activity and risk of kidney stones: a systematic review and meta-analysis of cohort studies , 2018, European Journal of Epidemiology.

[8]  G. Gambaro,et al.  Dietary and Lifestyle Risk Factors Associated with Incident Kidney Stones in Men and Women , 2017, The Journal of urology.

[9]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[10]  Christopher S Saigal,et al.  Annual Incidence of Nephrolithiasis among Children and Adults in South Carolina from 1997 to 2012. , 2016, Clinical journal of the American Society of Nephrology : CJASN.

[11]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[12]  R. Rasooly,et al.  Urinary Stone Disease: Progress, Status, and Needs. , 2015, Urology.

[13]  L. Melton,et al.  The ROKS nomogram for predicting a second symptomatic stone episode. , 2014, Journal of the American Society of Nephrology : JASN.

[14]  Paul A. Harris,et al.  Secondary use of clinical data: The Vanderbilt approach , 2014, J. Biomed. Informatics.

[15]  M. Pearle,et al.  Medical management of kidney stones: AUA guideline. , 2014, The Journal of urology.

[16]  P. Kimmel,et al.  Emergency department visits, use of imaging, and drugs for urolithiasis have increased in the United States , 2013, Kidney international.

[17]  Christopher S Saigal,et al.  Prevalence of kidney stones in the United States. , 2012, European urology.

[18]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[19]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[20]  P. Harris,et al.  Research electronic data capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support , 2009, J. Biomed. Informatics.

[21]  G. Curhan,et al.  Time trends in reported prevalence of kidney stones in the United States: 1976-1994. , 2003, Kidney international.

[22]  R. Nespoli,et al.  A prospective study of recurrence rate and risk factors for recurrence after a first renal stone. , 1999, The Journal of urology.

[23]  M. Resnick,et al.  Urinary lithiasis in the black population: an epidemiological study and review of the literature. , 1987, The Journal of urology.

[24]  K. Sakhaee,et al.  Ambulatory evaluation of nephrolithiasis. Classification, clinical presentation and diagnostic criteria. , 1980, The American journal of medicine.

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[27]  F. Coe,et al.  Recurrence after a single renal stone in a community practice. , 1985, Mineral and electrolyte metabolism.

[28]  Artificial intelligence in healthcare , 2022 .