Temporal Invariant Factor Disentangled Model for Representation Learning

This paper focuses on disentangling different kinds of underlying explanatory factors from image sequences. From the temporal perspective, we divide the explanatory factors into the temporal-invariant factor and the temporal-variant factor. The temporal-invariant factor corresponds to the categorical concept of objects in an image sequence while the temporal-variant factor describes the object appearance changing. We propose a disentangled model to disentangle from an image sequence the temporal-invariant factor that is used as an object representation insensitive to appearance changes. Our model is built upon the variational auto-encoder (VAE) and the recurrent neural network (RNN) to independently approximate the posterior distributions of the factor in an unsupervised manner. Experimental results on the HeadPose image database show the effectiveness of the proposed method.

[1]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[5]  David Amos,et al.  Generative Temporal Models with Memory , 2017, ArXiv.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Scott E. Reed,et al.  Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[10]  Pascal Frossard,et al.  Graph-based Isometry Invariant Representation Learning , 2017, ICML.

[11]  Tao Mei,et al.  Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure , 2016, IJCAI.

[12]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[13]  Xiaogang Wang,et al.  Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations , 2014, NIPS.

[14]  Tieniu Tan,et al.  Learning Invariant Deep Representation for NIR-VIS Face Recognition , 2017, AAAI.