An effective sample preparation method for diabetes prediction

Diabetes is a chronic disorder caused by metabolic malfunction in carbohydrate metabolism and it has become a serious health problem worldwide. Early and correct detection of diabetes can significantly influence the treatment process of diabetic patients and thus eliminate the associated side effects. Machine learning is an emerging field of high importance for providing prognosis and a deeper understanding of the classification of diseases such as diabetes. This study proposed a high precision diagnostic system by modifying k-means clustering technique. In the first place, noisy, uncertain and inconsistent data was detected by new clustering method and removed from data set. Then, diabetes prediction model was generated by using Support Vector Machine (SVM). Employing the proposed diagnostic system to classify Pima Indians Diabetes data set (PID) resulted in 99.64% classification accuracy with 10-fold cross validation. The results from our analysis show the new system is highly successful compared to SVM and the classical k-means algorithm & SVM regarding classification performance and time consumption. Experimental results indicate that the proposed approach outperforms previous methods.

[1]  Murat Köklü,et al.  Analysis of a Population of Diabetic Patients Databases with Classifiers , 2013 .

[2]  G. Kalaiselvi,et al.  Prediction Of Diabetes Using Soft Computing Techniques- A Survey , 2015 .

[3]  Smt. T Jayakumari,et al.  Prognosis of Diabetes Using Data mining Approach-Fuzzy C Means Clustering and Support Vector Machine , 2014 .

[4]  Osama Abu Abbas,et al.  Comparisons Between Data Clustering Algorithms , 2008, Int. Arab J. Inf. Technol..

[5]  Asha Gowda Karegowda,et al.  Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5 , 2012 .

[6]  Y Angeline Christobel,et al.  A NEW CLASSWISE K NEAREST NEIGHBOR (CKNN) METHOD FOR THE CLASSIFICATION OF DIABETES DATASET , 2013 .

[7]  Mohammad Saniee Abadeh,et al.  A fuzzy classification system based on Ant Colony Optimization for diabetes disease diagnosis , 2011, Expert Syst. Appl..

[8]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[9]  Somula Ramasubbareddy,et al.  Classification of Heart Disease Using Support Vector Machine , 2019, Journal of Computational and Theoretical Nanoscience.

[10]  Esin Dogantekin,et al.  An automatic diabetes diagnosis system based on LDA-Wavelet Support Vector Machine Classifier , 2011, Expert Syst. Appl..

[11]  Kavita Burse,et al.  K-Fold Cross Validation and Classification Accuracy of PIMA Indian Diabetes Data Set Using Higher Order Neural Network and PCA , 2013 .

[12]  V. Srividhya,et al.  Performance Enhancement of Classifiers using Integration of Clustering and Classification Techniques , 2014 .

[13]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[14]  S. Jeyalatha,et al.  Diagnosis of diabetes using classification mining techniques , 2015, ArXiv.

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  Nihat Yilmaz,et al.  A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases , 2014, Journal of Medical Systems.