Contrastive Clustering

In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline.

[1]  Fei Wang,et al.  Deep Comprehensive Correlation Mining for Image Clustering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[3]  Andrea Vedaldi,et al.  Self-labelling via simultaneous clustering and representation learning , 2020, ICLR.

[4]  Lingfeng Wang,et al.  Deep Discriminative Clustering Analysis , 2019, ArXiv.

[5]  Zhang Yi,et al.  Robust Subspace Clustering via Thresholding Ridge Regression , 2015, AAAI.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zhen Wang,et al.  Large Graph Clustering With Simultaneous Spectral Embedding and Discretization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[9]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[10]  Saman Ghili,et al.  Tiny ImageNet Visual Recognition Challenge , 2014 .

[11]  Cheng Deng,et al.  Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[13]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  VincentPascal,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010 .

[15]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[16]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Rui Zhang,et al.  Autoencoder Constrained Clustering With Adaptive Neighbors , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[18]  M. Saquib Sarfraz,et al.  Clustering based Contrastive Learning for Improving Face Representations , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[19]  Daniel Cremers,et al.  Associative Deep Clustering: Training a Classification Network with No Labels , 2018, GCPR.

[20]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[21]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[22]  Amos Storkey,et al.  DHOG: Deep Hierarchical Object Grouping , 2020, ArXiv.

[23]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[25]  En Zhu,et al.  Deep Clustering with Convolutional Autoencoders , 2017, ICONIP.

[26]  Wei-Yun Yau,et al.  Structured AutoEncoders for Subspace Clustering , 2018, IEEE Transactions on Image Processing.

[27]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[28]  G. Krishna,et al.  Agglomerative clustering using the concept of mutual nearest neighbourhood , 1978, Pattern Recognit..

[29]  Feiping Nie,et al.  The Constrained Laplacian Rank Algorithm for Graph-Based Clustering , 2016, AAAI.

[30]  Shaogang Gong,et al.  Deep Semantic Clustering by Partition Confidence Maximisation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Weiwei Liu,et al.  Sparse Embedded k-Means Clustering , 2017, NIPS.

[32]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[33]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Zhongming Jin,et al.  Deep Robust Clustering by Contrastive Learning , 2020, ArXiv.

[35]  Ivor W. Tsang,et al.  Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering , 2011, IEEE Transactions on Neural Networks.

[36]  Hujun Bao,et al.  Understanding the Power of Clause Learning , 2009, IJCAI.

[37]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[38]  Lei Wang,et al.  Multiple Kernel k-Means Clustering with Matrix-Induced Regularization , 2016, AAAI.

[39]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[40]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[42]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[43]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[44]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Wei-Yun Yau,et al.  Deep Subspace Clustering with Sparsity Prior , 2016, IJCAI.

[46]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[47]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[48]  Lingfeng Wang,et al.  Deep Adaptive Image Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.