论文信息 - Linearizing Visual Processes with Convolutional Variational Autoencoders

Linearizing Visual Processes with Convolutional Variational Autoencoders

This work studies the problem of modeling non-linear visual processes by learning linear generative models from observed sequences. We propose a joint learning framework, combining a Linear Dynamic System and a Variational Autoencoder with convolutional layers. After discussing several conditions for linearizing neural networks, we propose an architecture that allows Variational Autoencoders to simultaneously learn the non-linear observation as well as the linear state-transition from a sequence of observed frames. The proposed framework is demonstrated experimentally in three series of synthesis experiments.

Hao Shen | Alexander Sagel | A. Sagel | Hao Shen

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Bart De Moor,et al. N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems , 1994, Autom..

[3] Roland Memisevic,et al. Learning to Relate Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Nuno Vasconcelos,et al. Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[6] Song-Chun Zhu,et al. Learning Dynamic Generator Model by Alternating Back-Propagation Through Time , 2018, AAAI.

[7] Stéphane Mallat,et al. Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[8] L. Perko. Differential Equations and Dynamical Systems , 1991 .

[9] Martin Kleinsteuber,et al. Alignment Distances on Systems of Bags , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[10] Song-Chun Zhu,et al. Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Zhongfeng Wang,et al. Dynamical Textures Modeling via Joint Video Dictionary Learning , 2017, IEEE Transactions on Image Processing.

[12] Max Welling,et al. Group Equivariant Convolutional Networks , 2016, ICML.

[13] Jiajun Wu,et al. Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[14] Max Welling,et al. Transformation Properties of Learned Visual Representations , 2014, ICLR.

[15] Uri Shalit,et al. Deep Kalman Filters , 2015, ArXiv.

[16] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[17] Antonio Manuel López Peña,et al. Procedural Generation of Videos to Train Deep Action Recognition Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19] Mario Sznaier,et al. DYAN: A Dynamical Atoms-Based Network for Video Prediction , 2018, ECCV.

[20] Yann LeCun,et al. Learning to Linearize Under Uncertainty , 2015, NIPS.

[21] Stéphane Mallat,et al. Group Invariant Scattering , 2011, ArXiv.

[22] Thomas Wiatowski,et al. A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction , 2015, IEEE Transactions on Information Theory.

[23] Nuno Vasconcelos,et al. Probabilistic kernels for the classification of auto-regressive visual processes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24] Antonio Torralba,et al. Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Stefano Soatto,et al. Dynamic Textures , 2003, International Journal of Computer Vision.

[26] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.

[27] René Vidal,et al. The alignment distance on Spaces of Linear Dynamical Systems , 2013, 52nd IEEE Conference on Decision and Control.

[28] Nuno Vasconcelos,et al. Classifying Video with Kernel Dynamic Textures , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29] José Carlos Príncipe,et al. Deep Predictive Coding Networks , 2013, ICLR.

[30] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[31] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[32] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Geoffrey E. Hinton,et al. Transforming Auto-Encoders , 2011, ICANN.

[34] Payam Saisan,et al. Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35] Xiaoou Tang,et al. Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Konstantinos G. Derpanis,et al. Two-Stream Convolutional Networks for Dynamic Texture Synthesis , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.

[38] Maximilian Karl,et al. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.