Data clustering using evidence accumulation

We explore the idea of evidence accumulation for combining the results of multiple clusterings. Initially, n d-dimensional data is decomposed into a large number of compact clusters; the K-means algorithm performs this decomposition, with several clusterings obtained by N random initializations of the K-means. Taking the co-occurrences of pairs of patterns in the same cluster as votes for their association, the data partitions are mapped into a co-association matrix of patterns. This n/spl times/n matrix represents a new similarity measure between patterns. The final clusters are obtained by applying a MST-based clustering algorithm on this matrix. Results on both synthetic and real data show the ability of the method to identify arbitrary shaped clusters in multidimensional data.

[1]  Richard C. Dubes,et al.  Cluster validity profiles , 1982, Pattern Recognit..

[2]  Anil K. Jain,et al.  Bootstrap technique in cluster analysis , 1987, Pattern Recognit..

[3]  Isak Gath,et al.  Detection and Separation of Ring-Shaped Clusters Using Fuzzy Clustering , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[5]  Victor L. Brailovsky,et al.  Probabilistic validation approach for clustering , 1995, Pattern Recognit. Lett..

[6]  Adrian E. Raftery,et al.  Principal Curve Clustering With Noise , 1997 .

[7]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  Joachim M. Buhmann,et al.  Unsupervised Learning without Overfitting: Empirical Risk Approximation as an Induction Principle for Reliable Clustering , 1999 .

[11]  Ana L. N. Fred,et al.  Clustering under a hypothesis of smooth dissimilarity increments , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[12]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[13]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..