A preliminary study of optimal variable weighting in k-means clustering

Recently, algorithms for optimally weighting variables in non-hierarchical and hierarchical clustering methods have been proposed. Preliminary Monte Carlo research has shown that at least one of these algorithms cross-validates extremely well.The present study applies a k-means, optimal weighting procedure to two empirical data sets and contrasts its cross-validation performance with that of unit (i.e., equal) weighting of the variables. We find that the optimal weighting procedure cross-validates better in one of the two data sets. In the second data set its comparative performance strongly depends on the approach used to find seed values for the initial k-means partitioning.

[1]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[2]  T. Joyce,et al.  Classifying Market Survey Respondents , 1966 .

[3]  R. Jancey Multidimensional group analysis , 1966 .

[4]  Paul E. Green,et al.  Cluster Analysis in Test Market Selection , 1967 .

[5]  Donald G. Morrison,et al.  Measurement Problems in Cluster Analysis , 1967 .

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Paul E. Green,et al.  Numerical Taxonomy in Marketing Analysis: A Review Article , 1968 .

[8]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[9]  Frank L. Schmidt,et al.  The Relative Efficiency of Regression and Simple Unit Predictor Weights in Applied Differential Psychology , 1971 .

[10]  Brian Everitt,et al.  Cluster analysis , 1974 .

[11]  Roger K. Blashfield,et al.  Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. , 1976 .

[12]  A. Springall,et al.  A review of multidimensional scaling , 1978 .

[13]  Roger K. Blashfield,et al.  Computer Programs for Performing Iterative Partitioning Cluster Analysis , 1978 .

[14]  R. Blashfield,et al.  A Nearest-Centroid Technique for Evaluating the Minimum-Variance Clustering Procedure. , 1980 .

[15]  G. W. Milligan,et al.  A Two-Stage Clustering Algorithm with Robust Recovery Characteristics , 1980 .

[16]  P. Arabie,et al.  Overlapping Clustering: A New Method for Product Positioning , 1981 .

[17]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[18]  Wei-Chien Chang On using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions , 1983 .

[19]  Lee G. Cooper A Review of Multidimensional Scaling in Marketing Research , 1983 .

[20]  Wayne S. DeSarbo,et al.  Constrained classification: The use of a priori information in cluster analysis , 1984 .

[21]  Allan D. Shocker,et al.  A Customer-oriented Approach for Determining Market Structures , 1984 .

[22]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[23]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[24]  W. DeSarbo,et al.  Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm , 1985 .

[25]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[26]  G. Soete Optimal variable weighting for ultrametric and additive tree clustering , 1986 .

[27]  George R. Franke,et al.  Correspondence Analysis: Graphical Representation of Categorical Data in Marketing Research , 1986 .

[28]  Paul E. Green,et al.  Comparing Interpoint Distances in Correspondence Analysis: A Clarification , 1987 .

[29]  G. W. Milligan,et al.  Methodology Review: Clustering Methods , 1987 .

[30]  L. Collins,et al.  Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-disjoint Solutions. , 1988, Multivariate behavioral research.

[31]  E. Fowlkes,et al.  Variable selection in clustering , 1988 .

[32]  Dennis G. Fisher,et al.  The adjusted rand statistic: A SAS macro , 1988 .

[33]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[34]  G. W. Milligan,et al.  A validation study of a variable weighting algorithm for cluster analysis , 1989 .

[35]  Paul E. Green,et al.  Multidimensional Scaling: Concepts and Applications , 1989 .