Data clustering with modified K-means algorithm

This paper presents a data clustering approach using modified K-Means algorithm based on the improvement of the sensitivity of initial center (seed point) of clusters. This algorithm partitions the whole space into different segments and calculates the frequency of data point in each segment. The segment which shows maximum frequency of data point will have the maximum probability to contain the centroid of cluster. The number of cluster's centroid (k) will be provided by the user in the same manner like the traditional K-mean algorithm and the number of division will be k∗k (‘k’ vertically as well as ‘k’ horizontally). If the highest frequency of data point is same in different segments and the upper bound of segment crosses the threshold ‘k’ then merging of different segments become mandatory and then take the highest k segment for calculating the initial centroid (seed point) of clusters. In this paper we also define a threshold distance for each cluster's centroid to compare the distance between data point and cluster's centroid with this threshold distance through which we can minimize the computational effort during calculation of distance between data point and cluster's centroid. It is shown that how the modified k-mean algorithm will decrease the complexity & the effort of numerical calculation, maintaining the easiness of implementing the k-mean algorithm. It assigns the data point to their appropriate class or cluster more effectively.

[1]  Raihana Ferdous,et al.  An efficient k-means algorithm integrated with Jaccard distance measure for document clustering , 2009, 2009 First Asian Himalayas International Conference on Internet.

[2]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[4]  Giuseppe Di Fatta,et al.  Space Partitioning for Scalable K-Means , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[5]  Fan Yang,et al.  An Improved Initialization Center Algorithm for K-Means Clustering , 2010, 2010 International Conference on Computational Intelligence and Software Engineering.

[6]  D. Napoleon,et al.  An efficient K-Means clustering algorithm for reducing time complexity using uniform distribution data points , 2010, Trendz in Information Sciences & Computing(TISC2010).

[7]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[8]  Madjid Khalilian,et al.  A Novel Approach for High Dimensional Data Clustering , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[9]  Xiaola Lin,et al.  Exploiting Heterogeneity of Nodes to Enhance Search Performance in Large-Scale Peer-to-Peer Network , 2007 .

[10]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[11]  Gillian Dobbie,et al.  An Evolutionary Particle Swarm Optimization algorithm for data clustering , 2008, 2008 IEEE Swarm Intelligence Symposium.

[12]  Mingwei Leng,et al.  An Efficient K-means Clustering Algorithm Based on Influence Factors , 2007, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007).

[13]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .