论文信息 - Prediction of diseases by cascading clustering and classification

Prediction of diseases by cascading clustering and classification

Diagnosis of the disease is one of the application areas where data mining techniques helps in the extraction of knowledge from medical database. Recently, researchers have been investigating the effect of cascading more than one technique showing enhanced results in the diagnosis of the disease. This paper proposes a hybrid model using K-means as a preprocessing algorithm. The proposed model is developed in four stages. In the initial stage, datasets selected from the UCI repository is cleaned by deleting all the instances with missing values. In the second stage Best First search algorithm and Correlation based feature selection (CFS) are used in a cascaded fashion for relevant feature selection In the third stage the resultant dataset (binary class datasets) is then clustered into two segments using K-means and incorrectly clustered samples are eliminated to get final samples. Finally, the correctly clustered samples from the previous stage is trained with 12 different classifiers to build the final classifier model, using Stratified 10 fold cross validation. Experimental results proved that cascaded K-means clustering and classification with CFS and Best First as a Feature selection method showed enhanced classification accuracy on an average of 95% and above on 5 different medical datasets with all 12 classifiers.

B. V. Sumana | T. Santhanam | T. Santhanam | B. Sumana

[1] Mark A. Hall,et al. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[2] Enrico Coiera,et al. Guide to health informatics , 2015 .

[3] Narendra Sharma,et al. Comparison the various clustering algorithms of weka tools , 2012 .

[4] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5] Sunila Godara,et al. A Comparative Performance Analysis of Clustering , 2011 .

[6] T. Santhanam,et al. An Empirical Comparison of Ensemble and Hybrid Classification , 2014 .

[7] K. N. Balasubramanya Murthy,et al. A Data Mining Approach to the Diagnosis of Tuberculosis by Cascading Clustering and Classification , 2011, ArXiv.

[8] Nishu Sharma,et al. A Comparative Study Of Data Clustering Techniques , 2013 .

[9] Asha Gowda Karegowda,et al. Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients , 2012 .