Beyond Affinity Propagation: Message Passing Algorithms for Clustering

Affinity propagation is an exemplar-based clustering method that takes as input similarities between data points. It outputs a set of data points that best represent the data (exemplars), and assignments of each non-exemplar point to its most appropriate exemplar, thereby partitioning the data set into clusters. The objective of affinity propagation is to maximize the sum of similarities between the data points and their exemplars. In this thesis, we develop several extensions of affinity propagation. The extensions provide clustering tools that go beyond the capabilities of the basic affinity propagation algorithm, and generalize it to various problems of interest in machine learning. We also investigate alternative approaches to the underlying mechanism of affinity propagation using recent inference techniques that are based on optimization theory. Affinity propagation was first described using a particular graphical model for the exemplar-based clustering problem. We first provide an alternative graphical model and derivation of affinity propagation, which are more amenable to model manipulation. Building on this representation, we develop capacitated affinity propagation, semi-supervised affinity propagation, and the hierarchical affinity propagation algorithms. We also discuss the relationship of affinity propagation to some canonical problems in combinatorial optimization. The underlying mechanism of affinity propagation is an approximate inference procedure known as max-product belief propagation. We provide a comparison of affinity propagation to alternative inference techniques such as max-product linear programming, and dual decomposition. We show that for a collection of benchmark data sets, affinity propagation outperforms these more theoretically justified approaches. We conclude by discussing the contributions and findings of this thesis, and how they relate to current research themes in more general inference problems. We point to several interesting avenues for future research.

[1]  Horst Bischof,et al.  Saliency driven total variation segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Jaap Goudsmit,et al.  Viral Sex: The Nature of AIDS , 1997 .

[3]  Chaitanya Swamy,et al.  LP-based approximation algorithms for capacitated facility location , 2012, Math. Program..

[4]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[5]  Pushmeet Kohli,et al.  P³ & Beyond: Move Making Algorithms for Solving Higher Order Functions , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[7]  Brendan J. Frey,et al.  Graph Cuts is a Max-Product Algorithm , 2011, UAI.

[8]  David L. Robertson,et al.  Recombination in AIDS viruses , 1995, Journal of Molecular Evolution.

[9]  Brendan J. Frey,et al.  Semi-Supervised Affinity Propagation with Instance-Level Constraints , 2009, AISTATS.

[10]  Yinyu Ye,et al.  A Multiexchange Local Search Algorithm for the Capacitated Facility Location Problem , 2005, Math. Oper. Res..

[11]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[12]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Norio Matsuki,et al.  Circuit topology for synchronizing neurons in spontaneously active networks , 2010, Proceedings of the National Academy of Sciences.

[14]  Amir Globerson,et al.  Convergent message passing algorithms - a unifying view , 2009, UAI.

[15]  Yang Wang,et al.  Spatial-Temporal Affinity Propagation for Feature Clustering with Application to Traffic Video Analysis , 2010, ACCV.

[16]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Martine Peeters,et al.  Geographical distribution of HIV‐1 group O viruses in Africa , 1997, AIDS.

[18]  Martin J. Wainwright,et al.  On the Optimality of Tree-reweighted Max-product Message-passing , 2005, UAI.

[19]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[20]  David Peleg,et al.  Approximate hierarchical facility location and applications to the bounded depth Steiner tree and range assignment problems , 2009, J. Discrete Algorithms.

[21]  M. Bayati,et al.  Max-Product for Maximum Weight Matching: Convergence, Correctness, and LP Duality , 2008, IEEE Transactions on Information Theory.

[22]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[23]  Bert Huang,et al.  Loopy Belief Propagation for Bipartite Maximum Weight b-Matching , 2007, AISTATS.

[24]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[25]  S. L. HAKIMIt AN ALGORITHMIC APPROACH TO NETWORK LOCATION PROBLEMS. , 1979 .

[26]  Dmitry M. Malioutov,et al.  Linear programming analysis of loopy belief propagation for weighted matching , 2007, NIPS.

[27]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[28]  Jianping Fan,et al.  Towards More Precise Social Image-Tag Alignment , 2011, MMM.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Nevena Lazic,et al.  Message passing algorithms for facility location problems , 2011 .

[31]  Kristina Lerman,et al.  Analyzing microblogs with affinity propagation , 2010, SOMA '10.

[32]  Brendan J. Frey,et al.  Flexible Priors for Exemplar-based Clustering , 2008, UAI.

[33]  Brendan J. Frey,et al.  Hierarchical Affinity Propagation , 2011, UAI.

[34]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35]  D.P. Agrawal,et al.  APTEEN: a hybrid protocol for efficient routing and comprehensive information retrieval in wireless , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[36]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[37]  Marcos Negreiros,et al.  The capacitated centred clustering problem , 2006, Comput. Oper. Res..

[38]  Lei Li,et al.  Network Community Detection Based on Co-Neighbor Modularity Matrix with Spectral Clustering , 2011 .

[39]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[40]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[41]  Brendan J. Frey,et al.  Solving the Uncapacitated Facility Location Problem Using Message Passing Algorithms , 2010, AISTATS.

[42]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[43]  Daniel P. Miranker,et al.  Mining gene functional networks to improve mass-spectrometry-based protein identification , 2009, Bioinform..

[44]  D. Sontag 1 Introduction to Dual Decomposition for Inference , 2010 .

[45]  Tai Sing Lee,et al.  Efficient belief propagation for higher-order cliques using linear constraint nodes , 2008, Comput. Vis. Image Underst..

[46]  Greg Mori,et al.  Guiding model search using segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[47]  Devavrat Shah,et al.  Maximum weight matching via max-product belief propagation , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[48]  Mohammad Taghi Hajiaghayi,et al.  Assignment problem in content distribution networks: Unsplittable hard-capacitated facility location , 2009, TALG.

[49]  Pushmeet Kohli,et al.  Minimizing sparse higher order energy functions of discrete variables , 2009, CVPR.

[50]  Thomas Hofmann,et al.  Using Combinatorial Optimization within Max-Product Belief Propagation , 2007 .

[51]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[52]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[53]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[54]  J. Mulvey,et al.  Solving capacitated clustering problems , 1984 .

[55]  Rama Chellappa,et al.  Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds , 2011, Comput. Vis. Image Underst..

[56]  Tommi S. Jaakkola,et al.  Introduction to dual composition for inference , 2011 .

[57]  M. Weigt,et al.  Unsupervised and semi-supervised clustering by message passing: soft-constraint affinity propagation , 2007, 0712.1165.

[58]  Christine Nardini,et al.  Partitioning networks into communities by message passing. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[59]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[60]  Brendan J. Frey,et al.  Constructing Treatment Portfolios Using Affinity Propagation , 2008, RECOMB.

[61]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[63]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[64]  Seungjin Choi,et al.  Common Neighborhood Sub-graph Density as a Similarity Measure for Community Detection , 2009, ICONIP.

[65]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[66]  G. Tononi,et al.  Sleep-dependent improvement in visuomotor learning: a causal role for slow waves. , 2009, Sleep.

[67]  Kay Römer,et al.  Distributed Facility Location Algorithms for Flexible Configuration of Wireless Sensor Networks , 2007, DCOSS.

[68]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[69]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[70]  Anna Kazantseva,et al.  Linear Text Segmentation Using Affinity Propagation , 2011, EMNLP.

[71]  O. Kariv,et al.  An Algorithmic Approach to Network Location Problems. II: The p-Medians , 1979 .

[72]  Jianxiong Xiao,et al.  Joint Affinity Propagation for Multiple View Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[73]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[74]  Kresten Lindorff-Larsen,et al.  Similarity Measures for Protein Ensembles , 2009, PloS one.

[75]  Sriram Vishwanath,et al.  Distributed routing in networks using affinity propagation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[76]  Nigel J. Martin,et al.  Gene3D: comprehensive structural and functional annotation of genomes , 2007, Nucleic Acids Res..

[77]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[78]  Hasan Pirkul,et al.  Efficient algorithms for the capacitated concentrator location problem , 1987, Comput. Oper. Res..

[79]  Martin Skutella,et al.  Cooperative facility location games , 2000, SODA '00.

[80]  Yair Weiss,et al.  Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[81]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[83]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[84]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[85]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[86]  M. J. D. Powell,et al.  On search directions for minimization algorithms , 1973, Math. Program..

[87]  Delbert Dueck,et al.  Affinity Propagation: Clustering Data by Passing Messages , 2009 .

[88]  Tat-Seng Chua,et al.  Mediapedia: Mining Web Knowledge to Construct Multimedia Encyclopedia , 2010, MMM.

[89]  Martin J. Wainwright,et al.  Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes , 2010, J. Mach. Learn. Res..

[90]  Rong Wang,et al.  Integrating shotgun proteomics and mRNA expression data to improve protein identification , 2009, Bioinform..

[91]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[92]  Kristina Lerman,et al.  A probabilistic approach for learning folksonomies from structured data , 2011, WSDM '11.

[93]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[94]  Haldun Süral,et al.  A review of hierarchical facility location models , 2007, Comput. Oper. Res..

[95]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[96]  S. L. Hakimi,et al.  Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph , 1964 .

[97]  R. Sridharan The capacitated plant location problem , 1995 .

[98]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[99]  Tommi S. Jaakkola,et al.  Tree Block Coordinate Descent for MAP in Graphical Models , 2009, AISTATS.

[100]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[101]  Nikos Komodakis,et al.  Beyond pairwise energies: Efficient optimization for higher-order MRFs , 2009, CVPR.

[102]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[103]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[104]  Ran Wolff,et al.  A Local Facility Location Algorithm for Sensor Networks , 2005, DCOSS.

[105]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[106]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[107]  Yair Weiss,et al.  Linear Programming Relaxations and Belief Propagation - An Empirical Study , 2006, J. Mach. Learn. Res..

[108]  Martin J. Wainwright,et al.  Tree consistency and bounds on the performance of the max-product algorithm and its generalizations , 2004, Stat. Comput..

[109]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[110]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[111]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[112]  Ramesh C. Jain,et al.  Personal photo album summarization , 2009, MM '09.

[113]  Andrew R. Gehrke,et al.  Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo , 2010, The EMBO journal.

[114]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[115]  Thomas K. Berger,et al.  A synaptic organizing principle for cortical neuronal groups , 2011, Proceedings of the National Academy of Sciences.

[116]  Brendan J. Frey,et al.  A Binary Variable Model for Affinity Propagation , 2009, Neural Computation.

[117]  Judit Bar-Ilan,et al.  How to Allocate Network Centers , 1993, J. Algorithms.

[118]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[119]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[120]  Stephen Gould,et al.  Accelerated dual decomposition for MAP inference , 2010, ICML.

[121]  D. Bertsekas,et al.  Incremental subgradient methods for nondifferentiable optimization , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[122]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[123]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[124]  Jian Yu,et al.  Affinity Propagation on Identifying Communities in Social and Biological Networks , 2010, KSEM.

[125]  E. Marcotte,et al.  Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation , 2007, Nature Biotechnology.

[126]  Adrienne Chu,et al.  A Model-Based Analysis of Chemical and Temporal Patterns of Cuticular Hydrocarbons in Male Drosophila melanogaster , 2007, PloS one.

[127]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[128]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[129]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[130]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[131]  Dariu Gavrila,et al.  A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[132]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[133]  Robin Sibson,et al.  The Construction of Hierarchic and Non-Hierarchic Classifications , 1968, Comput. J..

[134]  Tomás Werner,et al.  High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[135]  Brendan J. Frey,et al.  FLoSS: Facility location for subspace segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[136]  C. Brennan,et al.  Identification of HIV type 1 group N infections in a husband and wife in Cameroon: viral genome sequences provide evidence for horizontal transmission. , 2006, AIDS research and human retroviruses.

[137]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[138]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .