How Many Bee Species? A Case Study in Determining the Number of Clusters

It is argued that the determination of the best number of clusters k is crucially dependent on the aim of clustering. Existing supposedly “objective” methods of estimating k ignore this. k can be determined by listing a number of requirements for a good clustering in the given application and finding a k that fulfils them all. The approach is illustrated by application to the problem of finding the number of species in a data set of Australasian tetragonula bees. Requirements here include two new statistics formalising the largest within-cluster gap and cluster separation. Due to the typical nature of expert knowledge, it is difficult to make requirements precise, and a number of subjective decisions is involved.