Human Action Generation with Generative Adversarial Networks

Inspired by the recent advances in generative models, we introduce a human action generation model in order to generate a consecutive sequence of human motions to formulate novel actions. We propose a framework of an autoencoder and a generative adversarial network (GAN) to produce multiple and consecutive human actions conditioned on the initial state and the given class label. The proposed model is trained in an end-to-end fashion, where the autoencoder is jointly trained with the GAN. The model is trained on the NTU RGB+D dataset and we show that the proposed model can generate different styles of actions. Moreover, the model can successfully generate a sequence of novel actions given different action labels as conditions. The conventional human action prediction and generation models lack those features, which are essential for practical applications.

[1]  Chi-Keung Tang,et al.  Deep Video Generation, Prediction and Completion of Human Action Sequences , 2017, ECCV.

[2]  Zicheng Liu,et al.  HP-GAN: Probabilistic 3D Human Motion Prediction via GAN , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[7]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[10]  Bingbing Ni,et al.  Skeleton-Aided Articulated Motion Generation , 2017, ACM Multimedia.

[11]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[12]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[14]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[17]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[19]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[20]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.