论文信息 - Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients

Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients

 Abstract— Medical Data mining is the process of extracting hidden patterns from medical data. This paper presents the development of a hybrid model for classifying Pima Indian diabetic database (PIDD). The model consists of three stages. In the first stage, K-means clustering is used to identify and eliminate incorrectly classified instances. In the second stage Genetic algorithm (GA) and Correlation based feature selection (CFS) is used in a cascaded fashion for relevant feature extraction, where GA rendered global search of attributes with fitness evaluation effected by CFS. Finally in the third stage a fine tuned classification is done using K-nearest neighbor (KNN) by taking the correctly clustered instance of first stage and with feature subset identified in the second stage as inputs for the KNN. Experimental results signify the cascaded K-means clustering and KNN along with feature subset identified GA_CFS has enhanced classification accuracy of KNN. The proposed model obtained the classification accuracy of 96.68% for diabetic dataset.

Asha Gowda Karegowda | A. S. Manjunath

[1] Joseph L. Breault,et al. Data Mining Diabetic Databases: Are Rough Sets a Useful Addition? , 2001 .

[2] Novruz Allahverdi,et al. Design of a hybrid system for the diabetes and heart diseases , 2008, Expert Syst. Appl..

[3] Kemal Polat,et al. A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine , 2008, Expert Syst. Appl..

[4] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[5] Mark A. Hall,et al. Correlation-based Feature Selection for Machine Learning , 2003 .

[6] Asha Gowda Karegowda,et al. Cascading GA & CFS for Feature Subset selection in Medical Data Mining , 2009, 2009 IEEE International Advance Computing Conference.

[7] P. O S I T I O N S T A T E M E N T,et al. Diagnosis and Classification of Diabetes Mellitus , 2011, Diabetes Care.

[8] Durga Toshniwal,et al. Hybrid prediction model for Type-2 diabetic patients , 2010, Expert Syst. Appl..

[9] Asha Gowda Karegowda,et al. Application of Genetic Algorithm Optimized Neural Network Connection Weights for Medical Diagnosis of PIMA Indians Diabetes , 2011 .

[10] Asha Gowda Karegowda,et al. Feature Subset Selection using Cascaded GA and CFS: A Filter Approach in Supervised Learning , 2011 .

[11] David J. Spiegelhalter,et al. Machine Learning, Neural and Statistical Classification , 2009 .