Deep Clustering with Self-supervision using Pairwise Data Similarities

Deep clustering incorporates embedding into clustering to find a lower-dimensional space appropriate for clustering. Most of the existing methods try to group similar data points through simultaneously minimizing clustering and reconstruction losses, employing an autoencoder (AE). However, they all ignore the relevant useful information available within pairwise data relationships. In this paper we propose a novel deep clustering framework with self-supervision using pairwise data similarities (DCSS). The proposed method consists of two successive phases. First, we propose a novel AE-based approach that aims to aggregate similar data points near a common group center in the latent space of an AE. The AE's latent space is obtained by minimizing weighted reconstruction and centering losses of data points, where weights are defined based on similarity of data points and group centers. In the second phase, we map the AE's latent space, using a fully connected network MNet, onto a K-dimensional space used to derive the final data cluster assignments, where K is the number of clusters. MNet is trained to strengthen (weaken) similarity of similar (dissimilar) samples. Experimental results on multiple benchmark datasets demonstrate the effectiveness of DCSS for data clustering and as a general framework for boosting up state-of-the-art clustering methods.