AGVNet: Attention Guided Velocity Learning for 3D Human Motion Prediction

Prediction of human motion plays a significant role in human-machine interactions for a variety of real-life applications. In this paper, we propose a novel attention-guided velocity learning network, AGVNet, that utilizes multi-order information such as positions and velocities derived from the dynamic states of the human body for predicting human motion. Unlike existing methods, our network formulates the human motion system as a dynamic system and predicts human motion using the position and velocity of poses. Specifically, a multi-level Encoder is proposed to model the dynamics of moving joints at the axis level and joint level. A recursive feedforward Decoder generates future poses recursively by reusing the predictions at the previous time-steps and fusing multiple order information from both the velocity and position space. To avoid the error accumulation, a unique loss function, ATPL (Attention Temporal Prediction Loss), is designed with decreasing attention to the later predictions, making the network more accurate for predictions at the early time-steps. The experiments on two benchmark datasets (i.e., Human$3.6$M and $3$DPW) confirm that our method achieves state-of-the-art performance with improved effectiveness. The code will be made public once the paper is accepted.

[1]  Juan Carlos Niebles,et al.  Action-Agnostic Human Pose Forecasting , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Xiao Guo,et al.  Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies , 2019, AAAI.

[3]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jiashi Feng,et al.  VRED: A Position-Velocity Recurrent Encoder-Decoder for Human Motion Prediction , 2019, ArXiv.

[5]  Qi Tian,et al.  Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yanfeng Wang,et al.  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hongdong Li,et al.  Learning Trajectory Dependencies for Human Motion Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Otmar Hilliges,et al.  Structured Prediction Helps 3D Human Motion Modelling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  José M. F. Moura,et al.  Adversarial Geometry-Aware Human Motion Prediction , 2018, ECCV.

[11]  Zhen Zhang,et al.  Convolutional Sequence to Sequence Model for Human Dynamics , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[13]  Francesc Moreno-Noguer,et al.  Context-Aware Human Motion Prediction , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yun Fu,et al.  Human Action Recognition and Prediction: A Survey , 2018, International Journal of Computer Vision.

[15]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  X. R. Li,et al.  Survey of maneuvering target tracking. Part I. Dynamic models , 2003 .

[18]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[19]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[20]  Yaqiao Li,et al.  Human Motion Prediction Via Pattern Completion in Latent Representation Space , 2019, 2019 16th Conference on Computer and Robot Vision (CRV).

[21]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  José M. F. Moura,et al.  Few-Shot Human Motion Prediction via Meta-learning , 2018, ECCV.

[23]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[25]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Roger Zimmermann,et al.  Towards Natural and Accurate Future Motion Prediction of Humans and Animals , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Francesc Moreno-Noguer,et al.  Human Motion Prediction via Spatio-Temporal Inpainting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  C. Lee Giles,et al.  A Neural Temporal Model for Human Motion Prediction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Shuicheng Yan,et al.  Predicting Scene Parsing and Motion Dynamics in the Future , 2017, NIPS.

[30]  Chao Li,et al.  Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation , 2018, IJCAI.

[31]  Yilong Yin,et al.  PISEP2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PISEP}{^2}$$\end{document}: pseudo-image sequence evoluti , 2019, The Visual Computer.