Health insurance claims contain valuable information for predicting the future health of a population. Nowadays, with many mature machine learning algorithms, models can be implemented to predict future medical costs and hospitalizations. However, it is well-known that the way in which the data are represented significantly affects the performance of machine learning algorithms. In health insurance claims, key clinical information mainly comes from the associated clinical codes, such as diagnosis codes and procedure codes, which are hierarchically structured. In this study, it is investigated whether the hierarchies of such clinical codes can be utilized to improve predictive performance in the context of predicting future days in hospital. Empirical investigations were done on data sets of different sizes, considering that the frequency of the appearance of lower-level (more specific) clinical codes could vary significantly in populations of different sizes. The use of bagged trees with feature sets that include only basic demographic features, low-level, medium-level, high-level clinical codes, and a full feature set were compared. The main finding from this study is that different hierarchies of clinical codes do not have a significant impact on the predictive power. Some other findings include: 1) Sample size greatly affects the predictive outcome (more observations result in more stable and more accurate outcomes); 2) Combined use of enriched demographic features and clinical features give better performance as compared to using them separately.
[1]
H. Quan,et al.
New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality.
,
2004,
Journal of clinical epidemiology.
[2]
M. Bjarnadóttir.
Data-driven approach to health care : applications using claims data
,
2008
.
[3]
UBINA,et al.
Predicting Days in Hospital using Health Insurance Claims
,
2016
.
[4]
Jing Zhao,et al.
Detecting Adverse Drug Events Using Concept Hierarchies of Clinical Codes
,
2014,
2014 IEEE International Conference on Healthcare Informatics.
[5]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.
[6]
Arlene S Ash,et al.
Predicting Pharmacy Costs and Other Medical Costs Using Diagnoses and Drug Claims
,
2005,
Medical care.