Disentangling Factors of Variation via Generative Entangling

Here we propose a novel model family with the objective of learning to disentangle the factors of variation in data. Our approach is based on the spike-and-slab restricted Boltzmann machine which we generalize to include higher-order interactions among multiple latent variables. Seen from a generative perspective, the multiplicative interactions emulates the entangling of factors of variation. Inference in the model can be seen as disentangling these generative factors. Unlike previous attempts at disentangling latent factors, the proposed model is trained using no supervised information regarding the latent factors. We apply our model to the task of facial expression classification.

[1]  Heikki Riittinen,et al.  Spectral classification of phonemes by learning subspaces , 1979, ICASSP.

[2]  I. Guyon,et al.  Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[3]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[4]  Teuvo Kohonen,et al.  Emergence of invariant-feature detectors in the adaptive-subspace self-organizing map , 1996, Biological Cybernetics.

[5]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[6]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[7]  Demetri Terzopoulos,et al.  Multilinear independent components analysis , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Rajesh P. N. Rao,et al.  Bilinear Sparse Coding for Invariant Vision , 2005, Neural Computation.

[9]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[10]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[12]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[13]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[14]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[15]  Geoffrey E. Hinton,et al.  Generating more realistic images using gated MRF's , 2010, NIPS.

[16]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Yann LeCun,et al.  Structured sparse coding via lateral inhibition , 2011, NIPS.

[18]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[19]  Yoshua Bengio,et al.  Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[20]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[21]  Yoshua Bengio,et al.  A Spike and Slab Restricted Boltzmann Machine , 2011, AISTATS.

[22]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[23]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.