Variational Clustering: Leveraging Variational Autoencoders for Image Clustering

Recent advances in deep learning have shown their ability to learn strong feature representations for images. The task of image clustering naturally requires good feature representations to capture the distribution of the data and subsequently differentiate data points from one another. Often these two aspects are dealt with independently and thus traditional feature learning alone does not suffice in partitioning the data meaningfully. Variational Autoencoders (VAEs) naturally lend themselves to learning data distributions in a latent space. Since we wish to efficiently discriminate between different clusters in the data, we propose a method based on VAEs where we use a Gaussian Mixture prior to help cluster the images accurately. We jointly learn the parameters of both the prior and the posterior distributions. Our method represents a true Gaussian Mixture VAE. This way, our method simultaneously learns a prior that captures the latent distribution of the images and a posterior to help discriminate well between data points. We also propose a novel reparametrization of the latent space consisting of a mixture of discrete and continuous variables. One key takeaway is that our method generalizes better across different datasets without using any pre-training or learnt models, unlike existing methods, allowing it to be trained from scratch in an end-to-end manner. We verify our efficacy and generalizability experimentally by achieving state-of-the-art results among unsupervised methods on a variety of datasets. To the best of our knowledge, we are the first to pursue image clustering using VAEs in a purely unsupervised manner on real image datasets.

[1]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  Scott M. Sawyer,et al.  Cluster-based 3D reconstruction of aerial video , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[5]  Lingfeng Wang,et al.  Deep Adaptive Image Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  D. Dowson,et al.  The Fréchet distance between multivariate normal distributions , 1982 .

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.

[9]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[10]  Oliver Nina,et al.  A Decoder-Free Approach for Unsupervised Clustering and Manifold Learning with Random Triplet Mining , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[11]  Wei Liu,et al.  Deep Spectral Clustering Using Dual Autoencoder Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Dipanjan Das,et al.  Deep Representation Learning Characterized by Inter-Class Separation for Image Clustering , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[14]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[15]  Cheng Deng,et al.  Balanced Self-Paced Learning for Generative Adversarial Clustering Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ngai-Man Cheung,et al.  Deep Clustering by Gaussian Mixture Variational Autoencoders With Graph Embedding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Chia-Wen Lin,et al.  CNN-Based Joint Clustering and Representation Learning with Feature Drift Compensation for Large-Scale Image Data , 2017, IEEE Transactions on Multimedia.

[18]  Cheng Deng,et al.  Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[20]  Arindam Saha,et al.  Mobiscan3D: A Low Cost Framework for Real Time Dense 3D Reconstruction on Mobile Devices , 2014, 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops.

[21]  M. Saquib Sarfraz,et al.  Efficient Parameter-Free Clustering Using First Neighbor Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Daniel Cremers,et al.  Associative Deep Clustering: Training a Classification Network with No Labels , 2018, GCPR.

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Michael Werman,et al.  Self-Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.

[27]  Kentaro Takagi,et al.  RDEC: Integrating Regularization into Deep Embedded Clustering for Imbalanced Datasets , 2018, ACML.

[28]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Feng Liu,et al.  Auto-encoder Based Data Clustering , 2013, CIARP.

[31]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32]  Jugal K. Kalita,et al.  Clustering Approach to Content Based Image Retrieval , 2006, Geometric Modeling and Imaging--New Trends (GMAI'06).

[33]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[34]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[35]  En Zhu,et al.  Deep Embedded Clustering with Data Augmentation , 2018, ACML.

[36]  Sreeram Kannan,et al.  ClusterGAN : Latent Space Clustering in Generative Adversarial Networks , 2018, AAAI.

[37]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[38]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[39]  Apurbaa Mallik,et al.  A Multi-Sensor Information Fusion Approach for Efficient 3 D Reconstruction in Smart Phone , 2015 .

[40]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[41]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[42]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[43]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[44]  J. Jensen Sur les fonctions convexes et les inégalités entre les valeurs moyennes , 1906 .

[45]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[46]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).