GAN-Poser: an improvised bidirectional GAN model for human motion prediction

A novel method called GAN-Poser has been explored to predict human motion in less time given an input 3D human skeleton sequence based on a generator–discriminator framework. Specifically, rather than using the conventional Euclidean loss, a frame-wise geodesic loss is used for geometrically meaningful and more precise distance measurement. In this paper, we have used a bidirectional GAN framework along with a recursive prediction strategy to avoid mode-collapse and to further regularize the training. To be able to generate multiple probable human-pose sequences conditioned on a given starting sequence, a random extrinsic factor $$\varTheta$$ Θ has also been introduced. The discriminator is trained in order to regress the extrinsic factor $$\varTheta$$ Θ , which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. In spite of being in a probabilistic framework, the modified discriminator architecture allows predictions of an intermediate part of pose sequence to be used as conditioning for prediction of the latter part of the sequence. This adversarial learning-based model takes into consideration of the stochasticity, and the bidirectional setup provides a new direction to evaluate the prediction quality against a given test sequence. Our resulting novel method, GAN-Poser, achieves superior performance over the state-of-the-art deep learning approaches when evaluated on the standard NTU-RGB-D and Human3.6 M dataset.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Christos H. Papadimitriou,et al.  On a Network Generalization of the Minmax Theorem , 2009, ICALP.

[4]  Cristian Sminchisescu,et al.  Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[5]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[7]  Wei Liu,et al.  Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic , 2018, IJCAI.

[8]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[9]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[10]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Mohan S. Kankanhalli,et al.  Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps , 2016, ECCV.

[12]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[13]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[16]  Zicheng Liu,et al.  HP-GAN: Probabilistic 3D Human Motion Prediction via GAN , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Nojun Kwak,et al.  3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information , 2016, ECCV Workshops.

[18]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  R. Venkatesh Babu,et al.  BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN , 2018, AAAI.

[20]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Torbjörn Wigren,et al.  Recursive prediction error identification of nonlinear state space models , 2004 .

[22]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[23]  Ali Dziri,et al.  Kullback–Leibler Divergence Between Multivariate Generalized Gaussian Distributions , 2019, IEEE Signal Processing Letters.

[24]  Xiaowei Zhou,et al.  Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Bodo Rosenhahn,et al.  RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Masoumeh Zareapoor,et al.  Perceptual image quality using dual generative adversarial network , 2020, Neural Computing and Applications.

[27]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[30]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Enhong Chen,et al.  Bidirectional Generative Adversarial Networks for Neural Machine Translation , 2018, CoNLL.

[33]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[34]  Juha Karhunen,et al.  Bidirectional Recurrent Neural Networks as Generative Models , 2015, NIPS.

[35]  John B. Moore,et al.  Recursive prediction error methods for adaptive estimation , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[36]  Pourya Shamsolmoali,et al.  AMIL: Adversarial Multi Instance Learning for Human Pose Estimation , 2020, ACM Trans. Multim. Comput. Commun. Appl..

[37]  Susan M. Astley,et al.  Evaluation of Kinect 3D Sensor for Healthcare Imaging , 2016, Journal of medical and biological engineering.

[38]  Pourya Shamsolmoali,et al.  G-GANISR: Gradual generative adversarial network for image super resolution , 2019, Neurocomputing.

[39]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[40]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Stefan J. Kiebel,et al.  Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks , 2012, Biological Cybernetics.

[43]  Wenmin Wang,et al.  Video Imagination from a Single Image with Transformation Generation , 2017, ACM Multimedia.

[44]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[45]  Premkumar Natarajan,et al.  Bidirectional Conditional Generative Adversarial Networks , 2017, ACCV.

[46]  Yann Ollivier,et al.  Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.

[47]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[49]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).