PoClustering: Lossless Clustering of Dissimilarity Data

Given a set of objects V with a dissimilarity measure between pairs of objects in V , a PoCluster is a collection of sets P ⊂ powerset(V ) partially ordered by the ⊂ relation such that S ⊂ T iff the maximal dissimilarity among objects in S is less than the maximal dissimilarity among objects in T . PoClusters capture categorizations of objects that are not strictly hierarchical, such as those found in ontologies. PoClusters can not, in general, be constructed using hierarchical clustering algorithms. In this paper, we examine the relationship between PoClusters and dissimilarity matrices and prove that PoClusters are in one-to-one correspondence with the set of dissimilarity matrices. The PoClustering problem is NP-Complete, and we present a heuristic algorithm for it in this paper. Experiments on both synthetic and real datasets demonstrate the quality and scalability of the algorithms.