论文信息 - Clustering and Classifying Diabetic Data Sets Using K-means Algorithm

Clustering and Classifying Diabetic Data Sets Using K-means Algorithm

The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present the Classification of diabetic's data set and the k-means algorithm to categorical domains. Before classify the data set preprocessing of data set is done to remove the noise in the data set. We use the missing value algorithm to replace the null values in the data set. This algorithm is also used to improve the classification rate and cluster the data set using two attributes namely plasma and pregnancy attribute.

P. Thangaraj | M. Kothainayaki

[1] Asha Gowda Karegowda,et al. Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5 , 2012 .

[2] Tutut Herawan,et al. Applying Variable Precision Rough Set for Clustering Diabetics Dataset , 2014, MUE 2014.

[3] Joshua Zhexue Huang,et al. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[4] K. Thilagavathi,et al. An Approach for Prediction of Diabetic disease by using b-Colouring Technique in Clustering Analysis , 2012 .

[5] M. Kannan,et al. Analysis of a Population of Diabetic Patients Databases in Weka Tool , 2011 .