Clustering through ranking on manifolds

Clustering aims to find useful hidden structures in data. In this paper we present a new clustering algorithm that builds upon the consistency method (Zhou, et.al., 2003), a semi-supervised learning technique with the property of learning very smooth functions with respect to the intrinsic structure revealed by the data. Other methods, e.g. Spectral Clustering, obtain good results on data that reveals such a structure. However, unlike Spectral Clustering, our algorithm effectively detects both global and within-class outliers, and the most representative examples in each class. Furthermore, we specify an optimization framework that estimates all learning parameters, including the number of clusters, directly from data. Finally, we show that the learned cluster-models can be used to add previously unseen points to clusters without re-learning the original cluster model. Encouraging experimental results are obtained on a number of real world problems.