Exemplar-Based Clustering via Simulated Annealing

Several authors have touted the p-median model as a plausible alternative to within-cluster sums of squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision of “exemplars” as cluster centers, robustness with respect to outliers, and the accommodation of a diverse range of similarity data. We developed a new simulated annealing heuristic for the p-median problem and completed a thorough investigation of its computational performance. The salient findings from our experiments are that our new method substantially outperforms a previous implementation of simulated annealing and is competitive with the most effective metaheuristics for the p-median problem.

[1]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[2]  Willem J. Heiser,et al.  A Permutation-Translation Simulated Annealing Algorithm for L1 and L2 Unidimensional Scaling , 2005, J. Classif..

[3]  F. E. Maranzana,et al.  On the Location of Supply Points to Minimize Transport Costs , 1964 .

[4]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[5]  Nicos Christofides,et al.  A tree search algorithm for the p-median problem , 1982 .

[6]  Fernando Y. Chiyoshi,et al.  A statistical analysis of simulated annealing applied to the p-median problem , 2000, Ann. Oper. Res..

[7]  Emile H. L. Aarts,et al.  Simulated annealing and Boltzmann machines - a stochastic approach to combinatorial optimization and neural computing , 1990, Wiley-Interscience series in discrete mathematics and optimization.

[8]  Pierre Hansen,et al.  The p-median problem: A survey of metaheuristic approaches , 2005, Eur. J. Oper. Res..

[9]  P. Hansen,et al.  Variable neighborhood search for the p-median , 1997 .

[10]  Willem J. Heiser,et al.  Global Optimization in Any Minkowski Metric: A Permutation-Translation Simulated Annealing Algorithm for Multidimensional Scaling , 2007, J. Classif..

[11]  Dominique Peeters,et al.  A comparison of two dual-based procedures for solving the p-median problem , 1985 .

[12]  M. Brusco,et al.  Heuristic Implementation of Dynamic Programming for Matrix Permutation Problems in Combinatorial Data Analysis , 2008 .

[13]  Pierre Hansen,et al.  Complement to a comparative analysis of heuristics for the p-median problem , 2008, Stat. Comput..

[14]  Subhash C. Narula,et al.  Technical Note - An Algorithm for the p-Median Problem , 1977, Oper. Res..

[15]  Hans-Friedrich Köhn,et al.  Comment on "Clustering by Passing Messages Between Data Points" , 2008, Science.

[16]  Pierre Hansen,et al.  Variable Neighborhood Decomposition Search , 1998, J. Heuristics.

[17]  Hrishikesh D. Vinod Mathematica Integer Programming and the Theory of Grouping , 1969 .

[18]  Polly Bart,et al.  Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph , 1968, Oper. Res..

[19]  Mauricio G. C. Resende,et al.  On the implementation of a swap-based local search procedure for the p -median problem ∗ , 2002 .

[20]  Brendan J. Frey,et al.  Response to Comment on "Clustering by Passing Messages Between Data Points" , 2008, Science.

[21]  C. Revelle,et al.  Heuristic concentration: Two stage solution construction , 1997 .

[22]  Claude Tadonki,et al.  Solving the p-Median Problem with a Semi-Lagrangian Relaxation , 2006, Comput. Optim. Appl..

[23]  J. Current,et al.  An efficient tabu search procedure for the p-Median Problem , 1997 .

[24]  G. Nemhauser,et al.  Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[25]  J. Current,et al.  Heuristic concentration and Tabu search: A head to head comparison , 1998 .

[26]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[27]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[28]  Iven Van Mechelen,et al.  The Local Minima Problem in Hierarchical Classes Analysis: An Evaluation of a Simulated Annealing Algorithm and Various Multistart Procedures , 2007 .

[29]  R. L. Thorndike Who belongs in the family? , 1953 .

[30]  Iven Van Mechelen,et al.  CLASSI: A classification model for the study of sequential processes and individual differences therein , 2008 .

[31]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[32]  M. Brusco,et al.  A Comparison of Heuristic Procedures for Minimum Within-Cluster Sums of Squares Partitioning , 2007 .

[33]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[34]  Francesco E. Maranzana,et al.  On the Location of Supply Points to Minimize Transportation Costs , 1963, IBM Syst. J..

[35]  Michael J. Brusco,et al.  Multicriterion Clusterwise Regression for Joint Segmentation Settings: An Application to Customer Value , 2003 .

[36]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[37]  H. Crowder,et al.  Cluster Analysis: An Application of Lagrangian Relaxation , 1979 .

[38]  Zvi Drezner,et al.  An Efficient Genetic Algorithm for the p-Median Problem , 2003, Ann. Oper. Res..

[39]  John E. Beasley,et al.  OR-Library: Distributing Test Problems by Electronic Mail , 1990 .

[40]  Roberto D. Galvão,et al.  A Dual-Bounded Algorithm for the p-Median Problem , 1980, Oper. Res..

[41]  M. Rao Cluster Analysis and Mathematical Programming , 1971 .

[42]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[43]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[44]  George L. Nemhauser,et al.  Note--On "Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms" , 1979 .

[45]  R. A. Whitaker,et al.  A Fast Algorithm For The Greedy Interchange For Large-Scale Clustering And Median Location Problems , 1983 .

[46]  Mauricio G. C. Resende,et al.  A Hybrid Heuristic for the p-Median Problem , 2004, J. Heuristics.

[47]  Alfred A. Kuehn,et al.  A Heuristic Program for Locating Warehouses , 1963 .

[48]  T. V. Levanova,et al.  Algorithms of Ant System and Simulated Annealing for the p-median Problem , 2004 .

[49]  Kenneth E. Rosing,et al.  An Empirical Investigation of the Effectiveness of a Vertex Substitution Heuristic , 1997 .

[50]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[51]  Richard L. Church,et al.  Applying simulated annealing to location-planning models , 1996, J. Heuristics.

[52]  Jean-Philippe Vial,et al.  Proximal ACCPM, a Cutting Plane Method for Column Generation and Lagrangian Relaxation: Application to the P-Median Problem , 2002 .

[53]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[54]  T. Klastorin The p-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach , 1985 .

[55]  Enrique Alba,et al.  Comparative analysis of modern optimization tools for the p-median problem , 2006, Stat. Comput..

[56]  M. Brusco,et al.  Optimal Partitioning of a Data Set Based on the p-Median Model , 2008 .

[57]  Igor Vasil'ev,et al.  Computational study of large-scale p-Median problems , 2007, Math. Program..

[58]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[59]  Éric D. Taillard,et al.  Heuristic Methods for Large Centroid Clustering Problems , 2003, J. Heuristics.

[60]  Charles ReVelle,et al.  Central Facilities Location , 2010 .