A Characterization of Linkage-Based Clustering: An Extended Abstract

There are a wide variety of clustering algorithms that, when run on the same data, often produce very different clusterings. Yet there is no principled method to guide the selection of a clustering algorithm. The choice of an appropriate clustering is, of course, task dependent. As such, we must rely on domain knowledge. The challenge is to communicate such knowledge between the domain expert and the algorithm designer. One approach to providing guidance to clustering users in the selection of a clustering algorithm is to identify important properties that a user may want an algorithm to satisfy, and determine which algorithms satisfy each of these properties. Clustering users can then utilize prior knowledge to determine the properties that make sense for their application.