InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

[1]  David J. C. MacKay,et al.  Unsupervised Classifiers, Mutual Information and 'Phantom Targets' , 1991, NIPS.

[2]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[3]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[4]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[5]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[6]  David Barber,et al.  Kernelized Infomax Clustering , 2005, NIPS.

[7]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[10]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[11]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[12]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[13]  A. Sayed,et al.  Foundations and Trends ® in Machine Learning > Vol 7 > Issue 4-5 Ordering Info About Us Alerts Contact Help Log in Adaptation , Learning , and Optimization over Networks , 2011 .

[14]  Yoshua Bengio,et al.  Disentangling Factors of Variation via Generative Entangling , 2012, ArXiv.

[15]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[21]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[23]  Xiaogang Wang,et al.  Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations , 2014, NIPS.

[24]  Ole Winther,et al.  Improving Semi-Supervised Learning with Auxiliary Deep Generative Models , 2015, NIPS 2015.

[25]  Scott E. Reed,et al.  Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[29]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[30]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[31]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[35]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[36]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Joshua B. Tenenbaum,et al.  Understanding Visual Concepts with Continuation Learning , 2016, ArXiv.

[38]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[39]  Katrina Evtimova,et al.  Understanding Mutual Information and its Use in InfoGAN , 2016 .

[40]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[41]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.