Cluster analysis: A toolbox for MATLAB.

A broad definition of clustering can be given as the search for homogeneous groupings of objects based on some type of available data. There are two common such tasks now discussed in (almost) all multivariate analysis texts and implemented in the commercially available behavioral and social science statistical software suites: hierarchical clustering and the K-means partitioning of some set of objects. This chapter begins with a brief review of these topics using two illustrative data sets that are carried along throughout this chapter for numerical illustration. Later sections will develop hierarchical clustering through least-squares and the characterizing notion of an ultrametric; K-means partitioning is generalized by rephrasing as an optimization problem of subdividing a given proximity matrix. In all instances, the MATLAB computational environment is relied on to effect our analyses, using the Statistical Toolbox, for example, to carry out the common hierarchical clustering and K-means methods, and our own open-source MATLAB M-files when the extensions go beyond what is currently available commercially (the latter are freely available as a Toolbox from www.cda.psych.uiuc.edu/clusteranalysis_mfiles). Also, to maintain a reasonable printed size for the present handbook contribution, the table of contents, figures, and tables for the full chapter, plus the final section and the header comments for the M-files in Appendix A, are available from www.cda.psych.uiuc.edu/cluster_analysis_parttwo.pdf