Hierarchical clustering with concave data sets

Clustering methods are among the most widely used methods in multivariate analysis. Two main groups of clustering methods can be distinguished: hierarchical and non-hierarchical. Due to the nature of the problem examined, this paper focuses on hierarchical methods such as the nearest neighbour, the furthest neighbour, Ward's method, between-groups linkage, within-groups linkage, centroid and median clustering. The goal is to assess the performance of different clustering methods when using concave sets of data, and also to figure out in which types of different data structures can these methods reveal and correctly assign group membership. The simulations were run in a two- and threedimensional space. Using different standard deviations of points around the skeleton further modified each of the two original shapes. In this manner various shapes of sets with different inter-cluster distances were generated. Generating the data sets provides the essential knowledge of cluster membership for comparing the clustering methods' performances. Conclusions are important and interesting since real life data seldom follow the simple convex-shaped structure, but need further work, such as the bootstrap application, the inclusion of the dendrogram-based analysis or other data structures. Therefore this paper can serve as a basis for further study of hierarchical clustering performance with concave sets.