A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning

Minimization of the within-cluster sums of squares (WCSS) is one of the most important optimization criteria in cluster analysis. Although cluster analysis modules in commercial software packages typically use heuristic methods for this criterion, optimal approaches can be computationally feasible for problems of modest size. This paper presents a new branch-and-bound algorithm for minimizing WCSS. Algorithmic enhancements include an effective reordering of objects and a repetitive solution approach that precludes the need for splitting the data set, while maintaining strong bounds throughout the solution process. The new algorithm provided optimal solutions for problems with up to 240 objects and eight well-separated clusters. Poorly separated problems with no inherent cluster structure were optimally solved for up to 60 objects and six clusters. The repetitive branch-and-bound algorithm was also successfully applied to three empirical data sets from the classification literature.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  William G. Cochran,et al.  Experimental Designs, 2nd Edition , 1950 .

[3]  S. S. Stevens,et al.  Handbook of experimental psychology , 1951 .

[4]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[5]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[6]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[7]  A W EDWARDS,et al.  A METHOD FOR CLUSTER ANALYSIS. , 1965, Biometrics.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  J. Decani,et al.  Maximum likelihood paired comparison ranking by linear programming , 1969 .

[10]  Robert E. Jensen,et al.  A Dynamic Programming Algorithm for Cluster Analysis , 1969, Oper. Res..

[11]  M. Rao Cluster Analysis and Mathematical Programming , 1971 .

[12]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[13]  James F. Korsh,et al.  A branch search algorithm for maximum likelihood paired comparison ranking , 1974 .

[14]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[15]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[16]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[17]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[18]  D. Defays A short note on a method of seriation , 1978 .

[19]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[20]  Jan-Eric Gustafsson,et al.  A Solution of the Conditional Estimation Problem for Long Tests in the Rasch Model for Dichotomous Items , 1980 .

[21]  David J. Hand,et al.  Branch and Bound in Statistical Data Analysis , 1981 .

[22]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[23]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[24]  G. Masters A rasch model for partial credit scoring , 1982 .

[25]  G. Diehr Evaluation of a Branch and Bound Algorithm for Clustering , 1985 .

[26]  C. David Vale,et al.  Linking Item Parameters Onto a Common Scale , 1986 .

[27]  Phipps Arabie,et al.  Combinatorial Data Analysis: Optimization by Dynamic Programming , 1987 .

[28]  Kevin F. Miller,et al.  Geometric Methods in Developmental Research , 1987 .

[29]  R. Luce,et al.  Measurement, scaling, and psychophysics. , 1988 .

[30]  Roger K. Blashfield,et al.  The Methods and Problems of Cluster Analysis , 1988 .

[31]  Vasant P. Bhapkar,et al.  Conditioning on ancillary statistics and loss of information in the presence of nuisance parameters , 1989 .

[32]  Christopher Clapham,et al.  The Concise Oxford Dictionary of Mathematics , 1990 .

[33]  Gary Klein,et al.  Optimal clustering: A model and method , 1991 .

[34]  L. Hubert,et al.  Combinatorial Data Analysis , 1992 .

[35]  F. Pukelsheim Optimal Design of Experiments , 1993 .

[36]  H. Huynh On equivalence between a partial credit item and a set of independent Rasch binary items , 1994 .

[37]  Ivo W. Molenaar,et al.  Estimation of Item Parameters , 1995 .

[38]  William H. E. Day,et al.  COMPLEXITY THEORY: AN INTRODUCTION FOR PRACTITIONERS OF CLASSIFICATION , 1996 .

[39]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[40]  Gintaras Palubeckis,et al.  A Branch-and-Bound Approach Using Polyhedral Results for a Clustering Problem , 1997, INFORMS J. Comput..

[41]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[42]  T. Eggen,et al.  On the loss of information in conditional maximum likelihood estimation of item parameters , 2000 .

[43]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[44]  M. Brusco A branch-and-bound algorithm for fitting anti-robinson structures to symmetric dissimilarity matrices , 2002 .

[45]  Douglas Steinley,et al.  Local optima in K-means clustering: what you don't know may hurt you. , 2003, Psychological methods.

[46]  M. Brusco An enhanced branch-and-bound algorithm for a partitioning problem. , 2003, The British journal of mathematical and statistical psychology.

[47]  Theodorus Johannes Hendrikus Maria Eggen,et al.  Contributions to the theory and practice of computerized adaptive testing , 2004 .

[48]  Jacqueline J. Meulman,et al.  Improving Dynamic Programming Strategies for Partitioning , 2004, J. Classif..

[49]  M. Brusco,et al.  Branch-and-Bound Applications in Combinatorial Data Analysis , 2005 .