A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering

Minimum sum-of-squares clustering (MSSC) consists in partitioning a given set of n points into k clusters in order to minimize the sum of squared distances from the points to the centroid of their cluster. Recently, Peng & Xia (2005) established the equivalence between 0-1 semidefinite programming (SDP) and MSSC. In this paper, we propose a branch-and-cut algorithm for the underlying 0-1 SDP model. The algorithm obtains exact solutions for fairly large data sets with computing times comparable with those of the best exact method found in the literature.

[1]  G. Boole An Investigation of the Laws of Thought: On which are founded the mathematical theories of logic and probabilities , 2007 .

[2]  George Boole,et al.  An Investigation of the Laws of Thought: Frontmatter , 2009 .

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  K. Florek,et al.  Sur la liaison et la division des points d'un ensemble fini , 1951 .

[5]  R. Fortet L’algebre de Boole et ses applications en recherche operationnelle , 1960 .

[6]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[7]  A W EDWARDS,et al.  A METHOD FOR CLUSTER ANALYSIS. , 1965, Biometrics.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Werner Dinkelbach On Nonlinear Fractional Programming , 1967 .

[10]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[11]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[12]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[13]  P. Hansen,et al.  Complete-Link Cluster Analysis by Graph Coloring , 1978 .

[14]  Rolph E. Anderson,et al.  Multivariate Data Analysis: Text and Readings , 1979 .

[15]  Pierre Hansen,et al.  Bicriterion Cluster Analysis , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[17]  G. Diehr Evaluation of a Branch and Bound Algorithm for Clustering , 1985 .

[18]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..

[19]  Martin Grötschel,et al.  Solution of large-scale symmetric travelling salesman problems , 1991, Math. Program..

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[22]  M. Inaba Application of weighted Voronoi diagrams and randomization to variance-based k-clustering , 1994, SoCG 1994.

[23]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[24]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[25]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[26]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[27]  Hanif D. Sherali,et al.  Reformulation-Linearization Techniques for Discrete Optimization Problems , 1998 .

[28]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[29]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[30]  Pierre Hansen,et al.  J-MEANS: a new local search heuristic for minimum sum of squares clustering , 1999, Pattern Recognit..

[31]  Franz Rendl,et al.  Graph partitioning using linear and semidefinite programming , 2003, Math. Program..

[32]  Jiming Peng,et al.  A new theoretical framework for K-means-type clustering , 2004 .

[33]  Hanif D. Sherali,et al.  A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem , 2005, J. Glob. Optim..

[34]  Jiming Peng,et al.  A Cutting Algorithm for the Minimum Sum-of-Squared Error Clustering , 2005, SDM.

[35]  M. Brusco A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning , 2006, Psychometrika.

[36]  D. Steinley Validating Clusters with the Lower Bound for Sum-of-Squares Error , 2007 .

[37]  Jiming Peng,et al.  Advanced Optimization Laboratory Title : Approximating K-means-type clustering via semidefinite programming , 2005 .

[38]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[39]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .