Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations

Recently there has been an increased interest in unsupervised learning of disentangled representations using the Variational Autoencoder (VAE) framework. Most of the existing work has focused largely on modifying the variational cost function to achieve this goal. We first show that these modifications, e.g. beta-VAE, simplify the tendency of variational inference to underfit causing pathological over-pruning and over-orthogonalization of learned components. Second we propose a complementary approach: to modify the probabilistic model with a structured latent prior. This prior allows to discover latent variable representations that are structured into a hierarchy of independent vector spaces. The proposed prior has three major advantages: First, in contrast to the standard VAE normal prior the proposed prior is not rotationally invariant. This resolves the problem of unidentifiability of the standard VAE normal prior. Second, we demonstrate that the proposed prior encourages a disentangled latent representation which facilitates learning of disentangled representations. Third, extensive quantitative experiments demonstrate that the prior significantly mitigates the trade-off between reconstruction loss and disentanglement over the state of the art.

[1]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[2]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[3]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[4]  Yu Zhang,et al.  Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.

[5]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[6]  Georg Martius,et al.  Variational Autoencoders Pursue PCA Directions (by Accident) , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[8]  A. Hyvärinen,et al.  Complex cell pooling and the statistics of natural images , 2007, Network.

[9]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[10]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[11]  Liqing Zhang,et al.  Self-adaptive blind source separation based on activation functions adaptation , 2004, IEEE Transactions on Neural Networks.

[12]  Matthias Bethge,et al.  Lp-Nested Symmetric Distributions , 2010, J. Mach. Learn. Res..

[13]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[14]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[15]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[16]  Jonathan Tompson,et al.  Unsupervised Learning of Spatiotemporally Coherent Metrics , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[18]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[19]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[20]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[21]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[22]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[23]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Karl Ridgeway,et al.  A Survey of Inductive Biases for Factorial Representation-Learning , 2016, ArXiv.

[26]  Matthias Bethge,et al.  Hierarchical Modeling of Local Image Features through $L_p$-Nested Symmetric Distributions , 2009, NIPS.

[27]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[29]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[30]  Matthew J. Johnson,et al.  The LORACs prior for VAEs: Letting the Trees Speak for the Data , 2018, AISTATS.

[31]  M. Lewicki,et al.  THE GENERALIZED GAUSSIAN MIXTURE MODEL USING ICA , 2007 .

[32]  David Pfau,et al.  Towards a Definition of Disentangled Representations , 2018, ArXiv.

[33]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[34]  Kevin Murphy,et al.  Generative Models of Visually Grounded Imagination , 2017, ICLR.

[35]  Griewank,et al.  On automatic differentiation , 1988 .

[36]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[37]  M. Steel,et al.  Modeling and Inference with υ-Spherical Distributions , 1995 .

[38]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[39]  Stefano Soatto,et al.  Emergence of invariance and disentangling in deep representations , 2017 .

[40]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[41]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[42]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[43]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[44]  Matthias Bethge,et al.  Characterization of the p-generalized normal distribution , 2009, J. Multivar. Anal..

[45]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[46]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.