Clustering in Low-Dimensional Space

It is often asserted that clustering techniques and multidimensional scaling (MDS) have mutually exclusive roles in the analysis of a given set of data, the former being especially indicated when the configuration of points is high-dimensional and has local regions of high density, while the latter would be more useful when the points are low-dimensional and more evenly distributed. This view has been challenged by, amongst others, Critchley and Heiser (1988). They showed how a set of objects with ultrametric distances can be mapped as a series of points on a line. Taking one step further, it may be argued that clustering is a way to stabilize and robustify the multidimensional scaling task, the aim being to fit a low-dimensional distance model to groups of points, rather than to single points. Thus finding clusters is not an end in itself, but subordinate to some other task. This idea is discussed for the unfolding situation, in which we have to deal with single-peaked variables defined over a common set of points, and for the general MDS situation. A convergent least squares algorithm is described and illustrated. The approach leads to a useful decomposition of the badness-of-fit function into between and within components. For a fixed partitioning of the points a least squares version of Gower’s (1989) canonical distance analysis is obtained.

[1]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[2]  Generalised canonical analysis. , 1989 .

[3]  L. Tucker,et al.  An individual differences model for multidimensional scaling , 1963 .

[4]  H. Harman Modern factor analysis , 1961 .

[5]  Wayne S. DeSarbo,et al.  A simulated annealing methodology for clusterwise linear regression , 1989 .

[6]  Kohsuke Ogawa,et al.  An Approach to Simultaneous Estimation and Segmentation in Conjoint Analysis , 1987 .

[7]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  M. Wedel,et al.  Consumer benefit segmentation using clusterwise linear regression , 1989 .

[9]  W. Kamakura A Least Squares Procedure for Benefit Segmentation with Conjoint Experiments , 1988 .

[10]  A. D. Gordon,et al.  An Algorithm for Euclidean Sum of Squares Classification , 1977 .

[11]  B. Everitt,et al.  Multivariate Exploratory Data Analysis: A Perspective on Exploratory Factor Analysis. , 1988 .

[12]  C. Coombs A theory of data. , 1965, Psychology Review.

[13]  Hans-Hermann Bock,et al.  Classification and Related Methods of Data Analysis , 1988 .

[14]  Points of view analysis revisited: Fitting multidimensional structures to optimal distance components with cluster restrictions on the variables , 1993 .

[15]  Reinhard Suck,et al.  Progress in mathematical psychology , 1987 .

[16]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[17]  Herman Wold,et al.  Systems under indirect observation : causality, structure, prediction , 1982 .

[18]  P. Duncombe,et al.  Multivariate Descriptive Statistical Analysis: Correspondence Analysis and Related Techniques for Large Matrices , 1985 .

[19]  Willem J. Heiser,et al.  Hierarchical trees can be perfectly scaled in one dimension , 1988 .

[20]  M. Wedel,et al.  A fuzzy clusterwise regression approach to benefit segmentation , 1989 .

[21]  A. Agresti,et al.  Multiway Data Analysis , 1989 .

[22]  P. Legendre,et al.  Developments in Numerical Ecology , 1988 .

[23]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[24]  W. Heiser,et al.  A latent class unfolding model for analyzing single stimulus preference ratings , 1993 .

[25]  W. Heiser Joint Ordination of Species and Sites: The Unfolding Technique , 1987 .