A Linear Time-Complexity k-Means Algorithm Using Cluster Shifting

The k-means algorithm is known to have a time complexity of O(n2), where n is the input data size. This quadratic complexity debars the algorithm from being effectively used in large applications. In this article, an attempt is made to develop an O(n) complexity (linear order) counterpart of the k-means. The underlying modification includes a directional movement of intermediate clusters and thereby improves compactness and separability properties of cluster structures simultaneously. This process also results in an improved visualization of clustered data. Comparison of results obtained with the classical k-means and the present algorithm indicates usefulness of the new approach.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Alva L. Couch,et al.  Parallel K-means Clustering Algorithm on NOWs , 2003 .

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[5]  Malay K. Pakhira,et al.  A Modified k-means Algorithm to Avoid Empty Clusters , 2009 .

[6]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[7]  Jiali Mao,et al.  The Study of Parallel K-Means Algorithm , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Malay K. Pakhira Parallel k-means Algorithm on a Cyclic Network , 2007, IICAI.

[11]  Malay K. Pakhira,et al.  An Efficient Distributed Data Clustering Algorithm , 2010 .

[12]  Xiaobo Li,et al.  Parallel clustering algorithms , 1989, Parallel Comput..

[13]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[14]  Anil K. Jain,et al.  A VLSI Systolic Architecture for Pattern Clustering , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[16]  Ruoming Jin,et al.  Fast and exact out-of-core and distributed k-means clustering , 2006, Knowledge and Information Systems.

[17]  Mohammed J. Zaki,et al.  Parallel classification for data mining on shared-memory multiprocessors , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[18]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Hans-Peter Kriegel,et al.  Effective and efficient distributed model-based clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).