论文信息 - Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

The clustering of unlabeled raw images is a daunting task, which has recently been approached with some success by deep learning methods. Here we propose an unsupervised clustering framework, which learns a deep neural network in an end-to-end fashion, providing direct cluster assignments of images without additional processing. Multi-Modal Deep Clustering (MMDC), trains a deep network to align its image embeddings with target points sampled from a Gaussian Mixture Model distribution. The cluster assignments are then determined by mixture component association of image embeddings. Simultaneously, the same deep network is trained to solve an additional self-supervised task of predicting image rotations. This pushes the network to learn more meaningful image representations that facilitate a better clustering. Experimental results show that MMDC achieves or exceeds state-of-the-art performance on six challenging benchmarks. On natural image datasets we improve on previous results with significant margins of up to 20% absolute accuracy points, yielding an accuracy of 82% on CIFAR-10, 45% on CIFAR-100 and 69% on STL-10.

Daphna Weinshall | Guy Shiran

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Alexander Kolesnikov,et al. Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Lingfeng Wang,et al. Deep Adaptive Image Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5] Yi-Leh Wu,et al. Adaptive density-based spatial clustering of applications with noise (DBSCAN) according to data , 2015, 2015 International Conference on Machine Learning and Cybernetics (ICMLC).

[6] Xu Ji,et al. Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2019 .

[7] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[8] Xiaohua Zhai,et al. Self-Supervised GANs via Auxiliary Rotation Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[10] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11] Maurice Roux,et al. A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms , 2018, Journal of Classification.

[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[13] Ali Farhadi,et al. Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[14] Masoud Charkhabi,et al. Cluster Ensembles, Majority Vote, Voter Eligibility and Privileged Voters , 2014 .

[15] Jürgen Schmidhuber,et al. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[16] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[17] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[18] H. Kuhn. The Hungarian method for the assignment problem , 1955 .

[19] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[21] Xiaohua Zhai,et al. High-Fidelity Image Generation With Fewer Labels , 2019, ICML.

[22] Cheng Deng,et al. Balanced Self-Paced Learning for Generative Adversarial Clustering Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Fei Wang,et al. Deep Comprehensive Correlation Mining for Image Clustering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[26] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.

[27] Yann LeCun,et al. Stacked What-Where Auto-encoders , 2015, ArXiv.

[28] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[29] Jörg Sander. Density-Based Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[30] Qiang Liu,et al. A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture , 2018, IEEE Access.

[31] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[33] Raúl Santos-Rodríguez,et al. N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[34] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[35] Armand Joulin,et al. Unsupervised Learning by Predicting Noise , 2017, ICML.

[36] Maurice Roux,et al. A comparative study of divisive hierarchical clustering algorithms , 2015, ArXiv.

[37] Patrick Pérez,et al. Boosting Few-Shot Visual Learning With Self-Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Daniel Cremers,et al. Associative Deep Clustering: Training a Classification Network with No Labels , 2018, GCPR.

[41] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[42] Dhruv Batra,et al. Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Guido Moerkotte,et al. Partition-Based Clustering in Object Bases: From Theory to Practice , 1993, FODO.

[44] Andrew Zisserman,et al. Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] G. Krishna,et al. Agglomerative clustering using the concept of mutual nearest neighbourhood , 1978, Pattern Recognit..