Robust data clustering

We address the problem of robust clustering by combining data partitions (forming a clustering ensemble) produced by multiple clusterings. We formulate robust clustering under an information-theoretical framework; mutual information is the underlying concept used in the definition of quantitative measures of agreement or consistency between data partitions. Robustness is assessed by variance of the cluster membership, based on bootstrapping. We propose and analyze a voting mechanism on pairwise associations of patterns for combining data partitions. We show that the proposed technique attempts to optimize the mutual information based criteria, although the optimality is not ensured in all situations. This evidence accumulation method is demonstrated by combining the well-known K-means algorithm to produce clustering ensembles. Experimental results show the ability of the technique to identify clusters with arbitrary shapes and sizes.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[4]  Narendra Ahuja,et al.  Location- and Density-Based Hierarchical Clustering Using Similarity Analysis , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Mohamed A. Ismail,et al.  On-line hierarchical clustering , 1998, Pattern Recognit. Lett..

[6]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[7]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[8]  Levent Ertoz,et al.  A New Shared Nearest Neighbor Clustering Algorithm and its Applications , 2002 .

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[11]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[12]  Brian Everitt,et al.  Cluster analysis , 1974 .

[13]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[14]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Joachim M. Buhmann,et al.  Path Based Pairwise Data Clustering with Application to Texture Segmentation , 2001, EMMCVPR.

[17]  Behrooz Kamgar-Parsi,et al.  An improved branch and bound algorithm for computing k-nearest neighbors , 1985, Pattern Recognit. Lett..

[18]  Nello Cristianini,et al.  Spectral Kernel Methods for Clustering , 2001, NIPS.

[19]  David G. Stork,et al.  Pattern Classification , 1973 .

[20]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[22]  Joydeep Ghosh,et al.  Scale-based clustering using the radial basis function network , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).