Clustering is a fundamental problem in machine learning and has been approached in many ways. Two general and quite different approaches include iteratively fitting a mixture model (e.g., using EM) and linking together pairs of training cases that have high affinity (e.g., using spectral methods). Pair-wise clustering algorithms need not compute sufficient statistics and avoid poor solutions by directly placing similar examples in the same cluster. However, many applications require that each cluster of data be accurately described by a prototype or model, so affinity-based clustering - and its benefits - cannot be directly realized. We describe a technique called "affinity propagation", which combines the advantages of both approaches. The method learns a mixture model of the data by recursively propagating affinity messages. We demonstrate affinity propagation on the problems of clustering image patches for image segmentation and learning mixtures of gene expression models from microar-ray data. We find that affinity propagation obtains better solutions than mixtures of Gaussians, the K-medoids algorithm, spectral clustering and hierarchical clustering, and is both able to find a pre-specified number of clusters and is able to automatically determine the number of clusters. Interestingly, affinity propagation can be viewed as belief propagation in a graphical model that accounts for pairwise training case likelihood functions and the identification of cluster centers.
[1]
Sudipto Guha,et al.
A constant-factor approximation algorithm for the k-median problem (extended abstract)
,
1999,
STOC '99.
[2]
Tomer Hertz,et al.
Pairwise Clustering and Graphical Models
,
2003,
NIPS.
[3]
Katherine A. Heller,et al.
Bayesian hierarchical clustering
,
2005,
ICML.
[4]
Brendan J. Frey,et al.
Factor graphs and the sum-product algorithm
,
2001,
IEEE Trans. Inf. Theory.
[5]
Brendan J. Frey,et al.
Finding Novel Transcripts in High-Resolution Genome-Wide Microarray Data Using the GenRate Model
,
2005,
RECOMB.
[6]
Jianbo Shi,et al.
Learning Segmentation by Random Walks
,
2000,
NIPS.
[7]
Jitendra Malik,et al.
Contour and Texture Analysis for Image Segmentation
,
2001,
International Journal of Computer Vision.
[8]
R. Stoughton,et al.
Experimental annotation of the human genome using microarray technology
,
2001,
Nature.
[9]
Michael I. Jordan,et al.
On Spectral Clustering: Analysis and an algorithm
,
2001,
NIPS.
[10]
Heekuck Oh,et al.
Neural Networks for Pattern Recognition
,
1993,
Adv. Comput..
[11]
Jitendra Malik,et al.
Normalized cuts and image segmentation
,
1997,
Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[12]
Brendan J. Frey,et al.
Learning Generative Models of Similarity Matrices
,
2003,
UAI.