Extending Standard Cluster Algorithms to Allow for Group Constraints

Summary. This paper demonstrates how standard cluster algorithms like K-means or partitioning around medoids can be modified such that the final solution fulfills group constraints, which specify that certain data points must be or may not be in the same cluster. An extensible software implementation for the R statistical computing environment is presented that allows user-specified group constraints for clustering with respect to arbitrary distance measures. Finally we discuss applications of the methodology to market segmentation of household shopping basket panel data and model diagnostics for finite mixture models.