A multi-prototype clustering algorithm

Clustering is an important unsupervised learning technique widely used to discover the inherent structure of a given data set. Some existing clustering algorithms uses single prototype to represent each cluster, which may not adequately model the clusters of arbitrary shape and size and hence limit the clustering performance on complex data structure. This paper proposes a clustering algorithm to represent one cluster by multiple prototypes. The squared-error clustering is used to produce a number of prototypes to locate the regions of high density because of its low computational cost and yet good performance. A separation measure is proposed to evaluate how well two prototypes are separated. Multiple prototypes with small separations are grouped into a given number of clusters in the agglomerative method. New prototypes are iteratively added to improve the poor cluster separations. As a result, the proposed algorithm can discover the clusters of complex structure with robustness to initial settings. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed clustering algorithm.

[1]  Ming-Syan Chen,et al.  Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging , 2005, IEEE Trans. Knowl. Data Eng..

[2]  Yong Shi,et al.  A shrinking-based clustering approach for multidimensional data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[5]  Pengwei Hao,et al.  Hierarchical Structuring of Data on Manifolds , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[7]  Dimitrios Charalampidis,et al.  A modified k-means algorithm for circular invariant clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[9]  Tapio Elomaa,et al.  A Voronoi Diagram Approach to Autonomous Clustering , 2006, Discovery Science.

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  Eric W. Tyree,et al.  The use of linked line segments for cluster representation and data reduction , 1999, Pattern Recognit. Lett..

[12]  Joydeep Ghosh,et al.  Scale-based clustering using the radial basis function network , 1996, IEEE Trans. Neural Networks.

[13]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[15]  Malek Adjouadi,et al.  A clustering technique for random data classification , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[16]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[18]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[19]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[20]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[21]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[22]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[24]  Manish Sarkar,et al.  A clustering algorithm using an evolutionary programming-based approach , 1997, Pattern Recognit. Lett..

[25]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[26]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[27]  최재영,et al.  개선된 ISODATA 알고리즘을 이용한 공격 자동탐지 , 2010 .

[28]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[29]  Chee Keong Kwoh,et al.  On the Two-level Hybrid Clustering Algorithm , 2004 .

[30]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[31]  Mu-Chun Su,et al.  A new approach to clustering data with arbitrary shapes , 2005, Pattern Recognit..

[32]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[33]  Ana L. N. Fred,et al.  A New Cluster Isolation Criterion Based on Dissimilarity Increments , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[35]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[36]  M. Narasimha Murty,et al.  A hybrid clustering procedure for concentric and chain-like clusters , 1981, International Journal of Computer & Information Sciences.

[37]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..