GA based Dimension Reduction for enhancing performance of k-Means and Fuzzy k-Means: A Case Study for Categorization of Medical Dataset

Medical Data mining is the process of extracting hidden patterns from medical data. Among the several clustering algorithms, k-means is the one of most extensively used clustering techniques in addition to fuzzy k-means clustering. The performance of both k-means and fuzzy k-means clustering is influenced by the initial cluster centers and might converge to local optimum. In addition, the performance of any data mining algorithm is influenced by the significant feature subset. This paper attempts to augment the performance of both k-means and fuzzy k-means clustering using two stages. As part of first stage, this paper investigates the use of wrapper approach of feature selection for clustering, where Genetic algorithm (GA) is used as a random search technique for subset generation, wrapped with k-means clustering. In the second stage of projected work, GA and Entropy based fuzzy clustering (EFC) are used to find the initial centroids for both k-means and fuzzy k-means clustering. Investigations have been directed using standard medical dataset namely Pima Indians Diabetes Dataset (PIDD). Experimental results confirm markable decline of almost 7% in the classification error of both k-means and fuzzy k-means clustering with GA nominated significant features and GA identified initial centroids when compared to randomly selected centroids with all features.

[1]  Hao-jun Sun,et al.  Genetic Algorithm-Based High-dimensional Data Clustering Technique , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[4]  James C. Bezdek,et al.  Fuzzy mathematics in pattern classification , 1973 .

[5]  Mohammad Reza Meybodi,et al.  A new hybrid approach for data clustering , 2010, 2010 5th International Symposium on Telecommunications.

[6]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[8]  Gerald W. Kimble,et al.  Information and Computer Science , 1975 .

[9]  J. F. Jimenez,et al.  Genetic algorithms applied to clustering problem and data mining , 2007 .

[10]  Wesam M. Ashour,et al.  Initializing K-Means Clustering Algorithm using Statistical Information , 2011 .

[11]  Asha Gowda Karegowda,et al.  Improving Performance of K-Means Clustering by Initializing Cluster Centers Using Genetic Algorithm and Entropy Based Fuzzy Clustering for Categorization of Diabetic Patients , 2013 .

[12]  Amiya Kumar Rath,et al.  A hybridized K-means clustering approach for high dimensional dataset , 2010 .

[13]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[16]  Dilip Kumar Pratihar,et al.  Genetic algorithm-tuned entropy-based fuzzy C-means algorithm for obtaining distinct and compact clusters , 2011, Fuzzy Optim. Decis. Mak..

[17]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[18]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[19]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[20]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[21]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..