K-Means for Spherical Clusters with Large Variance in Sizes

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results. Keywords—K-Means, Data Clustering, Cluster Analysis.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[4]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[5]  Tomi Kinnunen,et al.  Improving K-Means by Outlier Removal , 2005, SCIA.

[6]  S. Deelers,et al.  Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance , 2007 .

[7]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[8]  D. Pham,et al.  Selection of K in K-means clustering , 2005 .

[9]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[10]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[11]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[15]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[16]  Abdel-Badeeh M. Salem,et al.  An efficient enhanced k-means clustering algorithm , 2006 .

[17]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[18]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[19]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[20]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[21]  A. S. Thoke,et al.  International Journal of Electrical and Computer Engineering 3:16 2008 Fault Classification of Double Circuit Transmission Line Using Artificial Neural Network , 2022 .

[22]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.