Database Clustering Based on Multi-Prototype Representation of Cluster

Clustering is a useful technique to provide the organization of multimedia database. Using single prototype to represent each cluster may not adequately model the different types of clusters and hence limits the clustering performance on the complex data structure. This paper proposes a clustering algorithm based on multi-prototype representation of cluster. The square-error clustering is used to produce a number of prototypes to locate the regions of high density. The prototypes are organized into a given number of clusters in agglomerative method based on a proposed separation measure. New prototypes are iteratively added to improve the poor cluster boundaries. As a result, the proposed algorithm can discover the clusters of complex structure. Experimental results demonstrate the effectiveness of the proposed clustering algorithm.

[1]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Ana L. N. Fred,et al.  A New Cluster Isolation Criterion Based on Dissimilarity Increments , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yong Shi,et al.  A shrinking-based clustering approach for multidimensional data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Chee Keong Kwoh,et al.  On the Two-level Hybrid Clustering Algorithm , 2004 .

[6]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[7]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ming-Syan Chen,et al.  Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging , 2005, IEEE Trans. Knowl. Data Eng..

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.