Optimal Partitioning of a Data Set Based on the p-Median Model

Abstract Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  I. J. Schoenberg,et al.  The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[3]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[4]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[5]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[6]  S. L. Hakimi,et al.  Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph , 1964 .

[7]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[8]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  Polly Bart,et al.  Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph , 1968, Oper. Res..

[11]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[12]  Richard M. Karp,et al.  The Traveling-Salesman Problem and Minimum Spanning Trees , 1970, Oper. Res..

[13]  M. Rao Cluster Analysis and Mathematical Programming , 1971 .

[14]  J. Hair Multivariate data analysis , 1972 .

[15]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[16]  Philip Wolfe,et al.  Validation of subgradient optimization , 1974, Math. Program..

[17]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[18]  L. Hubert,et al.  Quadratic assignment as a general data analysis strategy. , 1976 .

[19]  G. Nemhauser,et al.  Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[20]  D. Erlenkotter Facility Location with Price-Sensitive Demands: Private, Public, and Quasi-Public , 1977 .

[21]  Subhash C. Narula,et al.  Technical Note - An Algorithm for the p-Median Problem , 1977, Oper. Res..

[22]  Lawrence Hubert,et al.  Applications of combinatorial programming to data analysis: The traveling salesman and related problems , 1978 .

[23]  George L. Nemhauser,et al.  Note--On "Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms" , 1979 .

[24]  H. Crowder,et al.  Cluster Analysis: An Application of Lagrangian Relaxation , 1979 .

[25]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[26]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[27]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[28]  Nicos Christofides,et al.  A tree search algorithm for the p-median problem , 1982 .

[29]  Dominique Peeters,et al.  A comparison of two dual-based procedures for solving the p-median problem , 1985 .

[30]  T. Klastorin The p-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach , 1985 .

[31]  L. Hubert Assignment methods in combinatorial data analysis , 1986 .

[32]  Phipps Arabie,et al.  Combinatorial Data Analysis: Optimization by Dynamic Programming , 1987 .

[33]  Martin Grötschel,et al.  Solution of large-scale symmetric travelling salesman problems , 1991, Math. Program..

[34]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[35]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[36]  Pierre Hansen,et al.  Variable Neighborhood Decomposition Search , 1998, J. Heuristics.

[37]  Michael J. Brusco,et al.  Combinatorial Data Analysis: Optimization by Dynamic Programming, by L. Hubert, P. Arabie, and J. Meulman , 2001, J. Classif..

[38]  Jean-Philippe Vial,et al.  Proximal ACCPM, a Cutting Plane Method for Column Generation and Lagrangian Relaxation: Application to the P-Median Problem , 2002 .

[39]  Michael J. Brusco,et al.  Multicriterion Clusterwise Regression for Joint Segmentation Settings: An Application to Customer Value , 2003 .

[40]  Roger W. Johnson,et al.  Exploring Relationships in Body Dimensions , 2003 .

[41]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[42]  Marshall L. Fisher,et al.  The Lagrangian Relaxation Method for Solving Integer Programming Problems , 2004, Manag. Sci..

[43]  M. Brusco,et al.  Branch-and-Bound Applications in Combinatorial Data Analysis , 2005 .

[44]  Claude Tadonki,et al.  Solving the p-Median Problem with a Semi-Lagrangian Relaxation , 2006, Comput. Optim. Appl..

[45]  M. Brusco A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning , 2006, Psychometrika.

[46]  D. Steinley Profiling local optima in K-means clustering: developing a diagnostic technique. , 2006, Psychological methods.

[47]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[48]  Lawrence Hubert,et al.  The Structural Representation of Proximity Matrices with MATLAB , 2006 .

[49]  Igor Vasil'ev,et al.  Computational study of large-scale p-Median problems , 2007, Math. Program..