Performance-Based Prediction of Chronic Kidney Disease Using Machine Learning for High-Risk Cardiovascular Disease Patients

People at high-risk of cardiovascular disease are most likely vulnerable to chronic kidney diseases, and historical medical records can help avert complicated kidney problems. In this paper, 12 supervised machine learning algorithms were used to analyses a retrospective electronic medical data on chronic kidney disease. The study targeted 544 outpatients although 48 failed to meet the inclusion criteria and some other 21 cases had missing values and were excluded from the study. The profiling and the preliminaries result established that 88.5% of the cases were labeled as advance CKD while 11.5% were labelled as early-stage CKD cases. The classification task and the subsequent evaluation of the models were based on the correct classification of the two groups. Of the evaluated algorithms, decision tree boosted decision tree, and CN2 rule induction was the least accurate ones. However, logistic regression (Ridge and Lasso), neural network (logistic and stochastic gradient descent), and support vector machine (Radial Basis Function and Polynomial) had very high accuracies and efficiency. With an efficiency of 93.4% and a classification accuracy of 91.7%, Polynomial Support Vector Machine algorithm was the most efficient and accurate. The model suggested 253 2-dimensional combinations of factors with a history of vascular diseases and smoking as the most influential factors. The other combinations can provide information that can be used to predict or detect chronic kidney disease based on historical records. Future research prospects should consider using discretized Glomerular Filtration Rate to ensure that the classification integrates the five stages of the CKD.