Learning Pairwise Similarity for Data Clustering

Each clustering algorithm induces a similarity between given data points, according to the underlying clustering criteria. Given the large number of available clustering techniques, one is faced with the following questions: (a) Which measure of similarity should be used in a given clustering problem? (b) Should the same similarity measure be used throughout the d-dimensional feature space? In other words, are the underlying clusters in given data of similar shape? Our goal is to learn the pairwise similarity between points in order to facilitate a proper partitioning of the data without the a priori knowledge of k, the number of clusters, and of the shape of these clusters. We explore a clustering ensemble approach combined with cluster stability criteria to selectively learn the similarity from a collection of different clustering algorithms with various parameter configurations

[1]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Mohamed S. Kamel,et al.  Cluster-Based Cumulative Ensembles , 2005, Multiple Classifier Systems.

[6]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[7]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[8]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.