Learning Variations in Human Motion via Mix-and-Match Perturbation

Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In this paper, we introduce an approach to stochastically combine the root of variations with previous pose information, which forces the model to take the noise into account. We exploit this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Our experiments demonstrate that our model yields high-quality pose sequences that are much more diverse than those from state-of-the-art stochastic motion prediction techniques.

[1]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Zicheng Liu,et al.  HP-GAN: Probabilistic 3D Human Motion Prediction via GAN , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Dario Pavllo,et al.  QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.

[4]  Sergey Levine,et al.  VideoFlow: A Flow-Based Generative Model for Video , 2019, ArXiv.

[5]  Otmar Hilliges,et al.  Learning Human Motion Models for Long-Term Predictions , 2017, 2017 International Conference on 3D Vision (3DV).

[6]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[9]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[10]  Hongdong Li,et al.  Action Anticipation By Predicting Future Dynamic Images , 2018, ECCV Workshops.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  José M. F. Moura,et al.  Few-Shot Human Motion Prediction via Meta-learning , 2018, ECCV.

[13]  Lars Petersson,et al.  Encouraging LSTMs to Anticipate Actions Very Early , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[15]  Gang Hua,et al.  CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Honglak Lee,et al.  Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[17]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[18]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Lars Petersson,et al.  VIENA2: A Driving Anticipation Dataset , 2018, ACCV.

[20]  Ersin Yumer,et al.  MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics , 2018, ECCV.

[21]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Adam Roberts,et al.  Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[23]  Martial Hebert,et al.  The Pose Knows: Video Forecasting by Generating Pose Futures , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Richard Hartley,et al.  Action Anticipation with RBF Kernelized Feature Mapping RNN , 2018, ECCV.

[25]  R. Venkatesh Babu,et al.  BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN , 2018, AAAI.

[26]  Danica Kragic,et al.  Anticipating Many Futures: Online Human Motion Prediction and Generation for Human-Robot Interaction , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[28]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[29]  José M. F. Moura,et al.  Adversarial Geometry-Aware Human Motion Prediction , 2018, ECCV.

[30]  Dario Pavllo,et al.  Modeling Human Motion with Quaternion-Based Neural Networks , 2019, International Journal of Computer Vision.

[31]  Xiao Lin,et al.  Human Motion Modeling using DVGANs , 2018, ArXiv.

[32]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[33]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Björn Ommer,et al.  A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.