CGMVAE: Coupling GMM Prior and GMM Estimator for Unsupervised Clustering and Disentanglement

Impressive progress has been recently witnessed on deep unsupervised clustering and feature disentanglement. In this paper, we propose a novel method on top of one recent architecture with a novel explanation of Gaussian mixture model (GMM) membership, accompanied by a GMM loss to enhance the clustering. The GMM loss is optimized with the explicitly computed parameters under our coupled GMM inspired framework. Specifically, our model takes the advantage of implicitly learning a GMM in latent space by neural networks (GMM prior as the first GMM), and explicitly clustering via the other GMM framework (GMM estimator as the second GMM). We further introduce a Dirichlet conjugate loss as a regularization term to prevent the GMM estimator from degenerating to few Gaussians. Eventually, we further propose an application of apparel generation based on the proposed method which requires only three selection steps. Extensive experiments on publicly available datasets demonstrate the effectiveness of our method, in terms of clustering and disentanglement performance.

[1]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[2]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[3]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[4]  Ngai-Man Cheung,et al.  Deep Clustering by Gaussian Mixture Variational Autoencoders With Graph Embedding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[6]  David Bau,et al.  Diverse Image Generation via Self-Conditioned GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ismail Uysal,et al.  Learning Latent Representations in Neural Networks for Clustering through Pseudo Supervision and Graph-based Activity Regularization , 2018, ICLR.

[8]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[11]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[12]  Poramate Manoonpong,et al.  An Explicit Local and Global Representation Disentanglement Framework with Applications in Deep Clustering and Unsupervised Object Detection , 2020, ArXiv.

[13]  Rob Brekelmans,et al.  Auto-Encoding Total Correlation Explanation , 2018, AISTATS.

[14]  Harri Valpola,et al.  Tagger: Deep Unsupervised Perceptual Grouping , 2016, NIPS.

[15]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[16]  Yang Hu,et al.  Personalized Fashion Design , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[20]  Aram Galstyan,et al.  Maximally Informative Hierarchical Representations of High-Dimensional Data , 2014, AISTATS.

[21]  Haoran Xie,et al.  Apparel Generation via Cluster-Indexed Global and Local Feature Representations , 2020, 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE).

[22]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[23]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[24]  Axel Brando Guillaumes,et al.  Mixture density networks for distribution and uncertainty estimation , 2017 .

[25]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[26]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  P. Galambos,et al.  Unsupervised Clustering for Deep Learning: A tutorial survey , 2018, Acta Polytechnica Hungarica.

[28]  Asim Kadav,et al.  S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[30]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[31]  Chen Change Loy,et al.  Online Deep Clustering for Unsupervised Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Nikolay Jetchev,et al.  The Conditional Analogy GAN: Swapping Fashion Articles on People Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[35]  Chen Li,et al.  Generating Multiple Hypotheses for 3D Human Pose Estimation With Mixture Density Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Li Sun,et al.  Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[38]  Rui Shu,et al.  A Note on Deep Variational Models for Unsupervised Clustering , 2017 .

[39]  Jo Yew Tham,et al.  Attribute Manipulation Generative Adversarial Networks for Fashion Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Peter Wonka,et al.  Disentangled Image Generation Through Structured Noise Injection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).