Detection of Type 2 Diabetes Using Clustering Methods – Balanced and Imbalanced Pima Indian Extended Dataset

Diabetes mellitus is a metabolic illness that causes high blood sugar, which is widely known as diabetes. Insulin is a hormone produced by an organ situated behind the abdomen called the pancreas. This insulin agent moves glucose from your blood into the cells for energy and storage. With diabetic disorder, the body either will not create enough insulin or can’t effectively use the insulin it does create. Untreated high blood glucose or sugar from diabetic disorder will harm the nerves, eyes, kidneys, and different organs of the body. There are different data mining software tools to predict and analyze diabetes. Many attempts have been made by researchers to improve the efficiency of various models. The proposed method is Dimensionality reduction and clustering technique. It gives the highest accuracy for the larger dataset for both balanced and imbalanced datasets. In this paper, large and small datasets have been taken for clustering using K-means approach, Farthest first method, Density based technique, Filtered clustering method and X-means approach. K-means, density based and X-means gives the highest accuracy of 75.64%. For the larger balanced dataset when compared with the smaller balanced dataset.

[1]  Dilip Singh Sisodia,et al.  Prediction of Diabetes using Classification Algorithms , 2018 .

[2]  Dong Yue,et al.  Interval Type-2 Fuzzy Local Enhancement Based Rough K-Means Clustering Considering Imbalanced Clusters , 2020, IEEE Transactions on Fuzzy Systems.

[3]  Amina Azrar,et al.  Data Mining Models Comparison for Diabetes Prediction , 2018 .

[4]  N. Sasipriyaa,et al.  Prediction of Type2 Diabetes Mellitus Based on Data Mining , 2018 .

[5]  P. Thangaraj,et al.  Clustering and Classifying Diabetic Data Sets Using K-means Algorithm , 2013 .

[6]  S. Jeyalatha,et al.  Diagnosis of diabetes using classification mining techniques , 2015, ArXiv.

[7]  N. Sneha,et al.  Analysis of diabetes mellitus for early prediction using optimal features selection , 2019, Journal of Big Data.

[8]  Ying Ju,et al.  Predicting Diabetes Mellitus With Machine Learning Techniques , 2018, Front. Genet..

[9]  Changsheng Zhu,et al.  Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques , 2019, Informatics in Medicine Unlocked.

[10]  Changchun Liu,et al.  Does the Temporal Asymmetry of Short-Term Heart Rate Variability Change during Regular Walking? A Pilot Study of Healthy Young Subjects , 2018, Comput. Math. Methods Medicine.

[11]  Asha Gowda Karegowda,et al.  Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients , 2012 .

[12]  Chien-Liang Liu,et al.  Model-Based Synthetic Sampling for Imbalanced Data , 2020, IEEE Transactions on Knowledge and Data Engineering.

[13]  Hua Yao,et al.  Analysis and Study of Diabetes Follow-Up Data Using a Data-Mining-Based Approach in New Urban Area of Urumqi, Xinjiang, China, 2016-2017 , 2018, Comput. Math. Methods Medicine.