Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering

Minimum sum-of-squares clustering consists in partitioning a given set of n points into c clusters in order to minimize the sum of squared distances from the points to the centroid of their cluster. Recently, Sherali and Desai (JOGO, 2005) proposed a reformulation-linearization based branch-and-bound algorithm for this problem, claiming to solve instances with up to 1,000 points. In this paper, their algorithm is investigated in further detail, reproducing some of their computational experiments. However, our computational times turn out to be drastically larger. Indeed, for two data sets from the literature only instances with up to 20 points could be solved in less than 10 h of computer time. Possible reasons for this discrepancy are discussed. The effect of a symmetry breaking rule due to Plastria (EJOR, 2002) and of the introduction of valid inequalities of the convex hull of points in two dimensions which may belong to each cluster is also explored.

[1]  M. Brusco A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning , 2006, Psychometrika.

[2]  Khaled S. Al-Sultan,et al.  Computational experience on four algorithms for the hard clustering problem , 1996, Pattern Recognit. Lett..

[3]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[4]  S. Dasgupta The hardness of k-means clustering , 2008 .

[5]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[6]  David W. Ailing Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Volume IV , 1963 .

[7]  Ronald L. Graham,et al.  An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set , 1972, Inf. Process. Lett..

[8]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[9]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[10]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[11]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[12]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[13]  Pierre Hansen,et al.  Analysis of Global k-Means, an Incremental Heuristic for Minimum Sum-of-Squares Clustering , 2005, J. Classif..

[14]  Hrishikesh D. Vinod Mathematica Integer Programming and the Theory of Grouping , 1969 .

[15]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[16]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[17]  M. Brusco,et al.  A Comparison of Heuristic Procedures for Minimum Within-Cluster Sums of Squares Partitioning , 2007 .

[18]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[19]  Hanif D. Sherali,et al.  Enhancing Lagrangian Dual Optimization for Linear Programs by Obviating Nondifferentiability , 2007, INFORMS J. Comput..

[20]  G. Nemhauser,et al.  Integer Programming , 2020 .

[21]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[22]  Pierre Hansen,et al.  An improved column generation algorithm for minimum sum-of-squares clustering , 2009, Math. Program..

[23]  Hanif D. Sherali,et al.  Reformulation-Linearization Techniques for Discrete Optimization Problems , 1998 .

[24]  B. Jaumard,et al.  Minimum Sum of Squares Clustering in a Low Dimensional Space , 1996 .

[25]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[26]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[27]  Frank Plastria,et al.  Formulating logical implications in combinatorial optimisation , 2002, Eur. J. Oper. Res..

[28]  Hanif D. Sherali,et al.  A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem , 2005, J. Glob. Optim..

[29]  Pierre Hansen,et al.  J-MEANS: a new local search heuristic for minimum sum of squares clustering , 1999, Pattern Recognit..