Development of human motion prediction strategy using inception residual block

Human Motion Prediction is a crucial task in computer vision and robotics. It has versatile application potentials such as in the area of human-robot interactions, human action tracking for airport security systems, autonomous car navigation, computer gaming to name a few. However, predicting human motion based on past actions is an extremely challenging task due to the difficulties in detecting spatial and temporal features correctly. To detect temporal features in human poses, we propose an Inception Residual Block(IRB), due to its inherent capability of processing multiple kernels to capture salient features. Here, we propose to use multiple 1-D Convolution Neural Network (CNN) with different kernel sizes and input sequence lengths and concatenate them to get proper embedding. As kernels strides over different receptive fields, they detect smaller and bigger salient features at multiple temporal scales. Our main contribution is to propose a residual connection between input and the output of the inception block to have a continuity between the previously observed pose and the next predicted pose. With this proposed architecture, it learns prior knowledge much better about human poses and we achieve much higher prediction accuracy as detailed in the paper. Subsequently, we further propose to feed the output of the inception residual block as an input to the Graph Convolution Neural Network (GCN) due to its better spatial feature learning capability. We perform a parametric analysis for better designing of our model and subsequently, we evaluate our approach on the Human 3.6M dataset and compare our short-term as well as long-term predictions with the state of the art papers, where our model outperforms most of the pose results, the detailed reasons of which have been elaborated in the paper.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hongdong Li,et al.  Learning Trajectory Dependencies for Human Motion Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Zhen Zhang,et al.  Convolutional Sequence to Sequence Model for Human Dynamics , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[6]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[7]  Danica Kragic,et al.  Anticipating Many Futures: Online Human Motion Prediction and Generation for Human-Robot Interaction , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Pascal Fua,et al.  Motion Prediction Using Temporal Inception Module , 2020, ACCV.

[9]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[10]  Jonathan P. How,et al.  Context-Aware Pedestrian Motion Prediction In Urban Intersections , 2018, ArXiv.

[11]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jesús Martínez del Rincón,et al.  Probabilistic Spatio-temporal 2D-Model for Pedestrian Motion Analysis in Monocular Sequences , 2006, AMDO.

[13]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[14]  Lucas Kovar,et al.  Motion Graphs , 2002, ACM Trans. Graph..

[15]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[21]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[22]  Yanfeng Wang,et al.  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hai-Feng Sang,et al.  Human Motion prediction based on attention mechanism , 2019, Multimedia Tools and Applications.

[24]  Lars Petersson,et al.  A Stochastic Conditioning Scheme for Diverse Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Danica Kragic,et al.  A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling , 2018, 1809.08875.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[29]  Rynson W. H. Lau,et al.  Motion Prediction for Online Gaming , 2008, MIG.

[30]  James J. Little,et al.  3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[32]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[33]  Yun Fu,et al.  Human Action Recognition and Prediction: A Survey , 2018, International Journal of Computer Vision.

[34]  Hema Swetha Koppula,et al.  Anticipating human activities for reactive robotic response , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xavier Bresson,et al.  Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks , 2017, NIPS.

[38]  G. Yadav,et al.  Development of Adaptive Sampling Based strategy for Human Activity Predictions Using Sequential Networks , 2020, 2020 IEEE 4th Conference on Information & Communication Technology (CICT).

[39]  Diego Marcheggiani,et al.  Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks , 2018, NAACL.

[40]  Jianbo Shi,et al.  Multi-hypothesis motion planning for visual object tracking , 2011, 2011 International Conference on Computer Vision.