Affinity Propagation and Uncapacitated Facility Location Problems

One of the most important distinctions that must be made in clustering research is the difference between models (or problems) and the methods for solving those problems. Nowhere is this more evident than with the evaluation of the popular affinity propagation algorithm (apcluster.m), which is a MATLAB implementation of a neural clustering method that has received significant attention in the biological sciences and other disciplines. Several authors have undertaken comparisons of apcluster.m with methods designed for models that fall within the class of uncapacitated facility location problems (UFLPs). These comparative models include the p-center (or K-center) model and, more importantly, the p-median (or K-median) model. The results across studies are conflicting and clouded by the fact that, although similar, the optimization model underlying apcluster.m is slightly different from the p-median model and appreciably different from the pcenter model. In this paper, we clarify that apcluster.m is actually a heuristic for a ‘maximization version’ of another model in the class of UFLPs, which is known as the simple plant location problem (SPLP). An exact method for the SPLP is described, and the apcluster.m program is compared to a fast heuristic procedure (sasplp.m) in both a simulation experiment and across numerous datasets from the literature. Although the exact method is the preferred approach when computationally feasible, both apcluster.m and sasplp.m are efficient and effective heuristic approaches, with the latter slightly outperforming the former in most instances.

[1]  J. Krarup,et al.  Sharp Lower Bounds and Efficient Algorithms for the Simple Plant Location Problem , 1977 .

[2]  Alfred A. Kuehn,et al.  A Heuristic Program for Locating Warehouses , 1963 .

[3]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[4]  G. W. Milligan,et al.  CLUSTERING VALIDATION: RESULTS AND IMPLICATIONS FOR APPLIED ANALYSES , 1996 .

[5]  Claude Tadonki,et al.  Solving the p-Median Problem with a Semi-Lagrangian Relaxation , 2006, Comput. Optim. Appl..

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  M. Brusco,et al.  Evaluating mixture modeling for clustering: recommendations and cautions. , 2011, Psychological methods.

[8]  Douglas Steinley,et al.  A New Variable Weighting and Selection Procedure for K-means Cluster Analysis , 2008, Multivariate behavioral research.

[9]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[10]  Douglas Steinley,et al.  Local optima in K-means clustering: what you don't know may hurt you. , 2003, Psychological methods.

[11]  R. D. Galvão,et al.  A method for solving to optimality uncapacitated location problems , 1990 .

[12]  Nicos Christofides,et al.  A tree search algorithm for the p-median problem , 1982 .

[13]  Fernando Y. Chiyoshi,et al.  A statistical analysis of simulated annealing applied to the p-median problem , 2000, Ann. Oper. Res..

[14]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[15]  S. Hakimi Optimum Distribution of Switching Centers in a Communication Network and Some Related Graph Theoretic Problems , 1965 .

[16]  Michael J. Brusco,et al.  Exemplar-Based Clustering via Simulated Annealing , 2009 .

[17]  Michael J. Brusco,et al.  Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques , 2007, J. Classif..

[18]  Jerzy Tiuryn,et al.  MODEVO: exploring modularity and evolution of protein interaction networks , 2010, Bioinform..

[19]  A. M. El-Shaieb,et al.  A New Algorithm for Locating Sources Among Destinations , 1973 .

[20]  Jeffrey T. Chang Deriving transcriptional programs and functional processes from gene expression databases , 2012, Bioinform..

[21]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[22]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[23]  Daniel P. Miranker,et al.  ShapePheno: unsupervised extraction of shape phenotypes from biological image collections , 2012, Bioinform..

[24]  Enrique Alba,et al.  Comparative analysis of modern optimization tools for the p-median problem , 2006, Stat. Comput..

[25]  Mauricio G. C. Resende,et al.  A Hybrid Heuristic for the p-Median Problem , 2004, J. Heuristics.

[26]  R. A. Whitaker,et al.  A Fast Algorithm For The Greedy Interchange For Large-Scale Clustering And Median Location Problems , 1983 .

[27]  Richard M. Karp,et al.  The Traveling-Salesman Problem and Minimum Spanning Trees , 1970, Oper. Res..

[28]  H. Crowder,et al.  Cluster Analysis: An Application of Lagrangian Relaxation , 1979 .

[29]  G. Nemhauser,et al.  Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[30]  Zvi Drezner,et al.  An Efficient Genetic Algorithm for the p-Median Problem , 2003, Ann. Oper. Res..

[31]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[32]  Fan Yang,et al.  A Poisson-based adaptive affinity propagation clustering for SAGE data , 2010, Comput. Biol. Chem..

[33]  Donald Erlenkotter,et al.  A Dual-Based Procedure for Uncapacitated Facility Location , 1978, Oper. Res..

[34]  Huafu Chen,et al.  Analysis of activity in fMRI data using affinity propagation clustering , 2011, Computer methods in biomechanics and biomedical engineering.

[35]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[36]  Pierre Hansen,et al.  Complement to a comparative analysis of heuristics for the p-median problem , 2008, Stat. Comput..

[37]  T. Klastorin The p-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach , 1985 .

[38]  Shoshana J. Wodak,et al.  Markov clustering versus affinity propagation for the partitioning of protein interaction graphs , 2009, BMC Bioinformatics.

[39]  Roberto D. Galvão,et al.  Uncapacitated facility location problems: contributions , 2004 .

[40]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[41]  Hans-Friedrich Köhn,et al.  Comment on "Clustering by Passing Messages Between Data Points" , 2008, Science.

[42]  M. Brusco,et al.  A Comparison of Heuristic Procedures for Minimum Within-Cluster Sums of Squares Partitioning , 2007 .

[43]  Sach Mukherjee,et al.  Network clustering: probing biological heterogeneity by sparse graphical models , 2011, Bioinform..

[44]  D. Steinley Profiling local optima in K-means clustering: developing a diagnostic technique. , 2006, Psychological methods.

[45]  Polly Bart,et al.  Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph , 1968, Oper. Res..

[46]  F. E. Maranzana,et al.  On the Location of Supply Points to Minimize Transport Costs , 1964 .

[47]  Robert Henson,et al.  OCLUS: An Analytic Method for Generating Clusters with Known Overlap , 2005, J. Classif..

[48]  I. J. Schoenberg,et al.  The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[49]  M. Brusco,et al.  Choosing the number of clusters in Κ-means clustering. , 2011, Psychological methods.

[50]  Pierre Hansen,et al.  The p-median problem: A survey of metaheuristic approaches , 2005, Eur. J. Oper. Res..

[51]  Pertti Järvinen,et al.  Technical Note - A Branch-and-Bound Algorithm for Seeking the P-Median , 1972, Oper. Res..

[52]  T. L. Ray,et al.  A Branch-Bound Algorithm for Plant Location , 1966, Oper. Res..

[53]  P. Hansen,et al.  Variable neighborhood search for the p-median , 1997 .

[54]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[55]  Francesco E. Maranzana,et al.  On the Location of Supply Points to Minimize Transportation Costs , 1963, IBM Syst. J..

[56]  Dominique Peeters,et al.  A comparison of two dual-based procedures for solving the p-median problem , 1985 .

[57]  M. Brusco,et al.  Optimal Partitioning of a Data Set Based on the p-Median Model , 2008 .

[58]  J. Current,et al.  An efficient tabu search procedure for the p-Median Problem , 1997 .

[59]  Brian Everitt,et al.  Cluster analysis , 1974 .

[60]  Subhash C. Narula,et al.  Technical Note - An Algorithm for the p-Median Problem , 1977, Oper. Res..

[61]  Igor Vasil'ev,et al.  Computational study of large-scale p-Median problems , 2007, Math. Program..

[62]  S. L. Hakimi,et al.  Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph , 1964 .

[63]  Charles ReVelle,et al.  Central Facilities Location , 2010 .

[64]  Brendan J. Frey,et al.  Response to Comment on "Clustering by Passing Messages Between Data Points" , 2008, Science.

[65]  M. Brusco,et al.  The p-median model as a tool for clustering psychological data. , 2010, Psychological methods.

[66]  John H. Morris,et al.  Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution , 2011, Bioinform..

[67]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[68]  M. Mézard,et al.  Analytic and Algorithmic Solution of Random Satisfiability Problems , 2002, Science.

[69]  M. Brusco,et al.  A variable-selection heuristic for K-means clustering , 2001 .

[70]  Pierre Hansen,et al.  Variable Neighborhood Decomposition Search , 1998, J. Heuristics.

[71]  Hrishikesh D. Vinod Mathematica Integer Programming and the Theory of Grouping , 1969 .

[72]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[73]  Chong-Yung Chi,et al.  CAM-CM: a signal deconvolution tool for in vivo dynamic contrast-enhanced imaging of complex tissues , 2011, Bioinform..

[74]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[75]  T. V. Levanova,et al.  Algorithms of Ant System and Simulated Annealing for the p-median Problem , 2004 .

[76]  Sach Mukherjee,et al.  Temporal clustering by affinity propagation reveals transcriptional modules in Arabidopsis thaliana , 2010, Bioinform..

[77]  George L. Nemhauser,et al.  Note--On "Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms" , 1979 .

[78]  Martin Grötschel,et al.  Solution of large-scale symmetric travelling salesman problems , 1991, Math. Program..

[79]  Philip Wolfe,et al.  Validation of subgradient optimization , 1974, Math. Program..

[80]  Michel Balinski,et al.  Integer Programming: Methods, Uses, Computations , 1965 .

[81]  Roberto D. Galvão,et al.  A Dual-Bounded Algorithm for the p-Median Problem , 1980, Oper. Res..

[82]  M. Rao Cluster Analysis and Mathematical Programming , 1971 .

[83]  Roger W. Johnson,et al.  Exploring Relationships in Body Dimensions , 2003 .

[84]  W. DeSarbo,et al.  The Heterogeneous P-Median Problem for Categorization Based Clustering , 2012, Psychometrika.

[85]  B. M. Khumawala,et al.  Comparison of exact and approximate methods of solving the uncapacitated plant location problem , 1985 .

[86]  M. Brusco,et al.  Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures , 2008 .