A Comparison of Supervised Machine Learning Techniques for Predicting Short-Term In-Hospital Length of Stay among Diabetic Patients

Diabetes is a life-altering medical condition that affects millions of people and results in many hospitalizations per year. Consequently, predicting the length of stay of in-hospital diabetic patients has become increasingly important for staffing and resource planning. Although statistical methods have been used to predict length of stay in hospitalized patients, many powerful machine learning techniques have not yet been explored. In this paper, we compare and discuss the performance of various supervised machine learning algorithms (i.e., Multiple linear regression, support vector machines, multi-task learning, and random forests) for predicting long versus short-term length of stay of hospitalized diabetic patients.

[1]  Cynna Selvy,et al.  All Patient Refined Diagnosis Related Groups (AP DRG) , 2016 .

[2]  N. Radhika,et al.  A novel approach for predicting the length of hospital stay with DBSCAN and supervised classification algorithms , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[3]  Peyman Rezaei Hachesu,et al.  Use of Data Mining Techniques to Determine and Predict Length of Stay of Cardiac Patients , 2013, Healthcare informatics research.

[4]  Michael L. Johnson,et al.  Predicting in-hospital mortality and hospital length of stay in diabetic patients , 2013 .

[5]  Vandana Pursnani Janeja,et al.  Predicting Hospital Length of Stay (PHLOS): A Multi-tiered Data Mining Approach , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[6]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  G. Escobar,et al.  Length of Stay Predictions: Improvements Through the Use of Automated Laboratory and Comorbidity Variables , 2010, Medical care.

[9]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[10]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[11]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[12]  Haiyan Gao,et al.  Length of stay as a performance indicator: robust statistical methodology , 2005 .

[13]  David J. Olive,et al.  Introduction to Regression Analysis , 2007 .

[14]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[15]  Shai Ben-David,et al.  A theoretical framework for learning from a pool of disparate data sources , 2002, KDD.

[16]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[17]  Steven Walczak,et al.  Predicting Hospital Length of Stay with Neural Networks , 1998, FLAIRS.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  J. Todd,et al.  Type 1 diabetes in mice is linked to the interleukin-1 receptor and Lsh/lty/Bcg genes on chromosome 1 , 1991, Nature.

[20]  D H Gustafson,et al.  Length of stay: prediction and explanation. , 1968, Health services research.

[21]  L. Davis,et al.  Prediction of hospital length of stay. , 1966, Health services research.