GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering

We propose a self-supervised Gaussian ATtention network for image Clustering (GATCluster). Rather than extracting intermediate features first and then performing the traditional clustering algorithm, GATCluster directly outputs semantic cluster labels without further post-processing. Theoretically, we give a Label Feature Theorem to guarantee the learned features are one-hot encoded vectors, and the trivial solutions are avoided. To train the GATCluster in a completely unsupervised manner, we design four self-learning tasks with the constraints of transformation invariance, separability maximization, entropy analysis, and attention mapping. Specifically, the transformation invariance and separability maximization tasks learn the relationships between sample pairs. The entropy analysis task aims to avoid trivial solutions. To capture the object-oriented semantics, we design a self-supervised attention mechanism that includes a parameterized attention module and a soft-attention loss. All the guiding signals for clustering are self-generated during the training process. Moreover, we develop a two-step learning algorithm that is memory-efficient for clustering large-size images. Extensive experiments demonstrate the superiority of our proposed method in comparison with the state-of-the-art image clustering benchmarks. Our code has been made publicly available at this https URL.

[1]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Zhouchen Lin,et al.  Self-Supervised Convolutional Subspace Clustering Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[5]  Pasi Fränti,et al.  Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Fei Wang,et al.  Deep Comprehensive Correlation Mining for Image Clustering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[14]  Yun Fu,et al.  Tell Me Where to Look: Guided Attention Inference Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[16]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[17]  Bo Zhang,et al.  Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders , 2017, Pattern Recognit..

[18]  Shuigeng Zhou,et al.  DeepCluster: A General Clustering Framework Based on Deep Learning , 2017, ECML/PKDD.

[19]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[20]  Tong Zhang,et al.  Deep Subspace Clustering Networks , 2017, NIPS.

[21]  M. Cugmas,et al.  On comparing partitions , 2015 .

[22]  Chia-Wen Lin,et al.  CNN-Based Joint Clustering and Representation Learning with Feature Drift Compensation for Large-Scale Image Data , 2017, IEEE Transactions on Multimedia.

[23]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[24]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[25]  Wei Wang,et al.  Deep Embedding Network for Clustering , 2014, 2014 22nd International Conference on Pattern Recognition.

[26]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[29]  Tao Li,et al.  The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[30]  Lingfeng Wang,et al.  Deep Adaptive Image Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[32]  Changming Sun,et al.  An End-to-End TextSpotter with Explicit Alignment and Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Cheng Deng,et al.  Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[36]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[37]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jiashi Feng,et al.  Deep Adversarial Subspace Clustering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Daniel Cremers,et al.  Associative Deep Clustering: Training a Classification Network with No Labels , 2018, GCPR.

[42]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[43]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jiancheng Lv,et al.  Unsupervised Multi-Manifold Clustering by Learning Deep Representation , 2017, AAAI Workshops.

[45]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[46]  Paolo Favaro,et al.  Representation Learning by Learning to Count , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[49]  Deyu Meng,et al.  DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[51]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[52]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[53]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[54]  Hujun Bao,et al.  Understanding the Power of Clause Learning , 2009, IJCAI.