Convalescing Cluster Configuration Using a Superlative Framework

Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks.

[1]  Salwani Abdullah,et al.  A combined approach for clustering based on K-means and gravitational search algorithms , 2012, Swarm Evol. Comput..

[2]  Abdolreza Hatamlou,et al.  In search of optimal centroids on data clustering using a binary search algorithm , 2012, Pattern Recognit. Lett..

[3]  Hui Xiong,et al.  K-means clustering versus validation measures: a data distribution perspective , 2006, KDD '06.

[4]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[5]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[6]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Markus Hegland,et al.  Data Mining - Challenges, Models, Methods and Algorithms , 2003 .

[11]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[14]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[15]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[16]  Ali Maroosi,et al.  Application of honey-bee mating optimization algorithm on clustering , 2007, Appl. Math. Comput..

[17]  Shokri Z. Selim,et al.  A simulated annealing algorithm for the clustering problem , 1991, Pattern Recognit..

[18]  Kyoung-jae Kim,et al.  A recommender system using GA K-means clustering in an online shopping market , 2008, Expert Syst. Appl..

[19]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[20]  A HaratyRamzi,et al.  An enhanced k-means clustering algorithm for pattern discovery in healthcare data , 2015 .

[21]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[22]  B. Kulkarni,et al.  An ant colony approach for clustering , 2004 .

[23]  G Patane,et al.  Fully automatic clustering system , 2002, IEEE Trans. Neural Networks.

[24]  Simon Fong,et al.  Towards Enhancement of Performance of K-Means Clustering Using Nature-Inspired Optimization Algorithms , 2014, TheScientificWorldJournal.

[25]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[26]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[27]  Salwani Abdullah,et al.  Data Clustering Using Big Bang–Big Crunch Algorithm , 2011 .

[28]  Carlo H. Séquin,et al.  Optimal adaptive k-means algorithm with dynamic adjustment of learning rate , 1995, IEEE Trans. Neural Networks.

[29]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[30]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[31]  Jiye Liang,et al.  An initialization method for the K-Means algorithm using neighborhood model , 2009, Comput. Math. Appl..

[32]  Aristidis Likas,et al.  The MinMax k-Means clustering algorithm , 2014, Pattern Recognit..

[33]  Frans Coenen,et al.  Best Clustering Configuration Metrics: Towards Multiagent Based Clustering , 2010, ADMA.

[34]  Ching-Yi Chen,et al.  Particle swarm optimization algorithm and its application to clustering analysis , 2004, 2012 Proceedings of 17th Conference on Electrical Power Distribution.

[35]  Dervis Karaboga,et al.  A novel clustering approach: Artificial Bee Colony (ABC) algorithm , 2011, Appl. Soft Comput..

[36]  Taher Niknam,et al.  An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis , 2010, Appl. Soft Comput..