Hierarchical Cluster Analysis and Fuzzy Sets

Cluster analysis is a tool for automatic classification of objects into a number of groups using a measure of association, so that objects in a group are similar and two groups are not similar. In this chapter a theory of hierarchical cluster analysis is presented with the emphasis on its relationships to fuzzy relations. This chapter can serve as an introductory text to methods of cluster analysis. Therefore materials which are not related to fuzzy sets but are necessary for cluster analysis are included in this chapter. Readers who are not interested in information retrieval very much may read this chapter immediately after Chapter 2. It should be noted that cluster analysis is sometimes called clustering. Here these two terms are used interchangeably. There is another class of methods for classification called nonhierarchical clustering, which includes well-known methods of fuzzy clustering, such as the fuzzy c-means by Bezdek (1981). We do not discuss nonhierarchical clustering, since clustering of documents is not dealt with by nonhierarchical methods, and there are already textbooks on nonhierarchical clustering. (See e.g., Bezdek, 1981.) Methods of hierarchical cluster analysis are divided into two classes of agglomerative methods and divisive methods. Divisive methods are not discussed here. Therefore hierarchical cluster analysis, or more simply, cluster analysis, in this monograph refers to agglomerative methods. This chapter is divided into eight sections.