Chapter 8 - Cluster analysis

The chapter explains how to use coefficients of association between objects or descriptors to cluster these into typologies or species associations. The chapter includes discussion of the following topics: definitions (hard clustering, crisp clustering, fuzzy clustering, descriptive clustering, and synoptic clustering), basic model (single linkage clustering, link, dendrogram, edge, node, undirected graph, chain, chaining, connected subgraph, minimum spanning tree, and chain of primary connections), cophenetic matrix, ultrametric property, panoply of clustering methods (sequential versus simultaneous algorithms, agglomeration versus division, monothetic versus polythetic methods, hierarchical versus non-hierarchical methods, constrained clustering methods, and probabilistic versus non-probabilistic methods), hierarchical agglomerative clustering (single linkage agglomerative clustering, complete linkage agglomerative clustering, intermediate linkage clustering, unweighted arithmetic average clustering or UPGMA, weighted arithmetic average clustering or WPGMA, unweighted centroid clustering or UPGMC, weighted centroid clustering or WPGMC, Ward's minimum variance method, general agglomerative clustering model, flexible clustering, and information analysis), reversals in clustering structure, hierarchical divisive clustering (monothetic methods, polythetic methods, division in ordination space, and TWINSPAN), partitioning by K-means, species clustering (biological associations, non-hierarchical complete linkage clustering, concordance analysis, and indicator species), seriation (trellis diagram, heat map, and non-symmetric matrices), multivariate regression trees (MRT), clustering statistics (connectedness, isolation, cophenetic correlation, Gower distance, modified Rand index, and Shepard-like diagram), cluster validation, cluster representation (dendrogram, connected subgraph, and skyline plot), and choice of a clustering method. Numerical methods are illustrated with real ecological applications, drawn from the literature. The chapter ends on a description of relevant software implemented in the R language; it also cites some commercially available statistical packages and programs from researchers.