论文信息 - FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. To disentangle the factors without supervision, our key idea is to use information theory to associate each factor to a latent code, and to condition the relationships between the codes in a specific way to induce the desired hierarchy. Through extensive experiments, we show that FineGAN achieves the desired disentanglement to generate realistic and diverse images belonging to fine-grained classes of birds, dogs, and cars. Using FineGAN's automatically learned features, we also cluster real images as a first attempt at solving the novel problem of unsupervised fine-grained object category discovery. Our code/models/demo can be found at https://github.com/kkanshul/finegan

Yong Jae Lee | Krishna Kumar Singh | Utkarsh Ojha | Utkarsh Ojha

[1] Andrew Zisserman,et al. Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[2] Ahmed M. Elgammal,et al. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Yann LeCun,et al. Energy-based Generative Adversarial Networks , 2016, ICLR.

[4] Alexei A. Efros,et al. Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[6] Lior Wolf,et al. Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[7] Li Fei-Fei,et al. Image Generation from Scene Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[9] Yuxin Peng,et al. Fine-Grained Image Classification via Combining Vision and Language , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Timnit Gebru,et al. Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[12] Bernt Schiele,et al. Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[14] Matthias Zwicker,et al. Disentangling Factors of Variation by Mixing Them , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Cheng Deng,et al. Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Arnold W. M. Smeulders,et al. Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[17] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[18] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19] Rama Chellappa,et al. Semi-supervised FusedGAN for Conditional Image Generation , 2018, ECCV.

[20] Fei-Fei Li,et al. Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[21] Lei Zhang,et al. Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Shu Kong,et al. Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Fei-Fei Li,et al. Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[24] Alexei A. Efros,et al. Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Ce Liu,et al. Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[27] Tao Mei,et al. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Subhransu Maji,et al. Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Vighnesh Birodkar,et al. Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[30] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Pietro Perona,et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[33] Peter N. Belhumeur,et al. POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Yong Jae Lee,et al. Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[35] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[36] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[37] Trevor Darrell,et al. Unsupervised Learning of Categories from Sets of Partially Matching Image Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38] Qi Wu,et al. Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[40] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.

[41] Byoung-Tak Zhang,et al. Generating Images Part by Part with Composite Generative Adversarial Networks , 2016, ArXiv.

[42] Dhruv Batra,et al. LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation , 2016, ICLR.

[43] Rishi Sharma,et al. A Note on the Inception Score , 2018, ArXiv.

[44] Yang Gao,et al. Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[46] Honglak Lee,et al. Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[47] Geoffrey E. Hinton,et al. Transforming Auto-Encoders , 2011, ICANN.

[48] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[49] Dhruv Batra,et al. Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Hugo Larochelle,et al. GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Larry S. Davis,et al. Weakly-supervised Discriminative Patch Learning via CNN for Fine-grained Recognition , 2016, ArXiv.

[52] Xiaogang Wang,et al. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53] Ali Farhadi,et al. Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[54] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Yann LeCun,et al. Energy-based Generative Adversarial Network , 2016, ICLR.

[56] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[57] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[58] Hui Jiang,et al. Generating images with recurrent adversarial networks , 2016, ArXiv.

[59] David W. Jacobs,et al. Dog Breed Classification Using Part Localization , 2012, ECCV.

[60] Xin Liu,et al. Document clustering based on non-negative matrix factorization , 2003, SIGIR.