An Efficient K -Medoids-Based Algorithm Using Previous Medoid Index, Triangular Inequality Elimination Criteria, and Partial Distance Search

Clustering in data mining is a discovery process that groups similar objects into the same cluster. Various clustering algorithms have been designed to fit various requirements and constraints of application. In this paper, we study several k-medoids-based algorithms including the PAM, CLARA and CLARANS algorithms. A novel and efficient approach is proposed to reduce the computational complexity of such k-medoids-based algorithms by using previous medoid index, triangular inequality elimination criteria and partial distance search. Experimental results based on elliptic, curve and Gauss-Markov databases demonstrate that the proposed algorithm applied to CLARANS may reduce the number of distance calculations by 67% to 92% while retaining the same average distance per object. In terms of the running time, the proposed algorithm may reduce computation time by 38% to 65% compared with the CLARANS algorithm.

[1]  Jeng-Shyang Pan,et al.  Bound for Minkowski metric or quadratic metric applied to VQ codeword search , 1996 .

[2]  E. Ruiz An algorithm for finding nearest neighbours in (approximately) constant average time , 1986 .

[3]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[4]  P. Rousseeuw,et al.  Hierarchical cluster analysis of emotional concerns and personality characteristics in a freshman population. , 1986, Acta psychiatrica Belgica.

[5]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Hilarie K. Orman,et al.  Activating Networks: A Progress Report , 1999, Computer.

[8]  R. Krishnapuram,et al.  A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[9]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[10]  C.-H. Lee,et al.  Fast closest codeword search algorithm for vector quantization , 1994 .

[11]  Robert M. Gray,et al.  An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization , 1985, IEEE Trans. Commun..

[12]  Jean-Michel Jolion,et al.  Robust Clustering with Applications in Computer Vision , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Jeng-Shyang Pan,et al.  Fast codeword search algorithm for image coding based on mean-variance pyramids of codewords , 2000 .

[15]  K. Sung,et al.  A fast encoding algorithm for vector quantization , 1997, IEEE Signal Process. Lett..

[16]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[17]  Enrique Vidal-Ruiz,et al.  An algorithm for finding nearest neighbours in (approximately) constant average time , 1986, Pattern Recognit. Lett..

[18]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[19]  C. B. Lucasius,et al.  On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasiblity and comparison , 1993 .

[20]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[21]  Mohamed S. Kamel,et al.  Equal-average hyperplane partitioning method for vector quantization of image data , 1992, Pattern Recognit. Lett..

[22]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[23]  Sin-Horng Chen,et al.  Fast search algorithm for vq-based recognition of isolated word , 1989, INFOCOM 1989.

[24]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[25]  Salvatore D. Morgera,et al.  A high-speed search algorithm for vector quantization , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  John F. Roddick,et al.  A comparative study and extensions to k-medoids algorithms , 2001 .

[27]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.