Evidence Accumulation Clustering Based on the K-Means Algorithm

The idea of evidence accumulation for the combination of multiple clusterings was recently proposed [7]. Taking the K-means as the basic algorithm for the decomposition of data into a large number, k, of compact clusters, evidence on pattern association is accumulated, by a voting mechanism, over multiple clusterings obtained by random initializations of the K-means algorithm. This produces a mapping of the clusterings into a new similarity measure between patterns. The final data partition is obtained by applying the single-link method over this similarity matrix. In this paper we further explore and extend this idea, by proposing: (a) the combination of multiple K-means clusterings using variable k; (b) using cluster lifetime as the criterion for extracting the final clusters; and (c) the adaptation of this approach to string patterns. This leads to a more robust clustering technique, with fewer design parameters than the previous approach and potential applications in a wider range of problems.

[1]  Adrian E. Raftery,et al.  Principal Curve Clustering With Noise , 1997 .

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Richard C. Dubes,et al.  Cluster validity profiles , 1982, Pattern Recognit..

[4]  Eric J. Pauwels,et al.  Finding regions of interest for content extraction , 1998, Electronic Imaging.

[5]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Joachim M. Buhmann,et al.  Path Based Pairwise Data Clustering with Application to Texture Segmentation , 2001, EMMCVPR.

[7]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[8]  Ana L. N. Fred,et al.  Clustering under a hypothesis of smooth dissimilarity increments , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[9]  Joachim M. Buhmann,et al.  Unsupervised Learning without Overfitting: Empirical Risk Approximation as an Induction Principle for Reliable Clustering , 1999 .

[10]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[11]  Isak Gath,et al.  Detection and Separation of Ring-Shaped Clusters Using Fuzzy Clustering , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  A. Hardy On the number of clusters , 1996 .

[13]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.

[16]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[17]  Victor L. Brailovsky,et al.  Probabilistic validation approach for clustering , 1995, Pattern Recognit. Lett..

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  Boris G. Mirkin,et al.  Concept Learning and Feature Selection Based on Square-Error Clustering , 1999, Machine Learning.

[20]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[23]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[24]  David G. Stork,et al.  Pattern Classification , 1973 .

[25]  Mineichi Kudo,et al.  MDL-Based Selection of the Number of Components in Mixture Models for Pattern Classification , 1998, SSPR/SPR.

[26]  Ana L. N. Fred,et al.  Hidden Markov models vs. syntactic modeling in object recognition , 1997, Proceedings of International Conference on Image Processing.

[27]  R. Casey,et al.  Advances in Pattern Recognition , 1971 .

[28]  Mohamed A. Ismail,et al.  On-line hierarchical clustering , 1998, Pattern Recognit. Lett..

[29]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..