TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems. Predicting body dynamics requires capturing subtle information embedded in the humans’ interactions with each other and with the objects present in the scene. In this paper, we propose a novel TRajectory and POse Dynamics (nicknamed TRiPOD) method based on graph attentional networks to model the human-human and human-object interactions both in the input space and the output space (decoded future output). The model is supplemented by a message passing interface over the graphs to fuse these different levels of interactions efficiently. Furthermore, to incorporate a real-world challenge, we propound to learn an indicator representing whether an estimated body joint is visible/invisible at each frame, e.g. due to occlusion or being outside the sensor field of view. Finally, we introduce a new benchmark for this joint task based on two challenging datasets (PoseTrack and 3DPW) and propose evaluation metrics to measure the effectiveness of predictions in the global space, even when there are invisible cases of joints. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.

[1]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Marco Pavone,et al.  Trajectron++: Dynamically-Feasible Trajectory Forecasting with Heterogeneous Data , 2020, ECCV.

[5]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Juan Carlos Niebles,et al.  Socially and Contextually Aware Human Motion and Pose Forecasting , 2020, IEEE Robotics and Automation Letters.

[7]  Minh Vo,et al.  Long-term Human Motion Prediction with Scene Context , 2020, ECCV.

[8]  Otmar Hilliges,et al.  Learning Human Motion Models for Long-Term Predictions , 2017, 2017 International Conference on 3D Vision (3DV).

[9]  Manuel Kaufmann,et al.  A Spatio-temporal Transformer for 3D Human Motion Prediction , 2020, 2021 International Conference on 3D Vision (3DV).

[10]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[11]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[12]  Julien Pettré,et al.  Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories With GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[14]  Yi Zhou,et al.  Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis , 2017, ICLR.

[15]  Mani Golparvar-Fard,et al.  Video-Based Motion Trajectory Forecasting Method for Proactive Construction Safety Monitoring Systems , 2020, J. Comput. Civ. Eng..

[16]  Scott Cohen,et al.  Forecasting Human Dynamics from Static Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[18]  Zhaoxin Li,et al.  STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Zhihai He,et al.  Reciprocal Learning Networks for Human Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Alexandre Alahi,et al.  Human Trajectory Forecasting in Crowds: A Deep Learning Perspective , 2020, IEEE Transactions on Intelligent Transportation Systems.

[21]  Zicheng Liu,et al.  HP-GAN: Probabilistic 3D Human Motion Prediction via GAN , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[23]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ba-Ngu Vo,et al.  A Consistent Metric for Performance Evaluation of Multi-Object Filters , 2008, IEEE Transactions on Signal Processing.

[25]  Silvio Savarese,et al.  Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks , 2019, NeurIPS.

[26]  Martial Hebert,et al.  The Pose Knows: Video Forecasting by Generating Pose Futures , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[28]  Marco Cristani,et al.  Transformer Networks for Trajectory Forecasting , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[29]  Juan Carlos Niebles,et al.  Imitation Learning for Human Pose Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Nadia Magnenat-Thalmann,et al.  Learning Progressive Joint Propagation for Human Motion Prediction , 2020, ECCV.

[33]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Ling Shao,et al.  Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[39]  Gary Tan,et al.  GraphTCN: Spatio-Temporal Interaction Modeling for Human Trajectory Prediction , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Bolei Zhou,et al.  TPNet: Trajectory Proposal Network for Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yu Tian,et al.  Semantic Graph Convolutional Networks for 3D Human Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[43]  Ye Yuan,et al.  DLow: Diversifying Latent Flows for Diverse Human Motion Prediction , 2020, ECCV.

[44]  M. Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Juan Carlos Niebles,et al.  Action-Agnostic Human Pose Forecasting , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[46]  Abduallah A. Mohamed,et al.  Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Mathieu Salzmann,et al.  History Repeats Itself: Human Motion Prediction via Motion Attention , 2020, ECCV.

[49]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[50]  Francesc Moreno-Noguer,et al.  Context-Aware Human Motion Prediction , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Ersin Yumer,et al.  MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics , 2018, ECCV.

[52]  Leonidas J. Guibas,et al.  Contact and Human Dynamics from Monocular Video , 2020, SCA.

[53]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Michael H. Schwartz,et al.  Deep neural networks enable quantitative movement analysis using single-camera videos , 2020, Nature Communications.

[55]  Torsten Bertram,et al.  Online trajectory prediction and planning for social robot navigation , 2017, 2017 IEEE International Conference on Advanced Intelligent Mechatronics (AIM).

[56]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[57]  J. Malik,et al.  It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction , 2020, ECCV.

[58]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[59]  Juan Carlos Niebles,et al.  Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[60]  Yue Hu,et al.  Collaborative Motion Prediction via Neural Motion Message Passing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Yanfeng Wang,et al.  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Marco Pavone,et al.  The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[63]  Lars Petersson,et al.  A Stochastic Conditioning Scheme for Diverse Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Shenghua Gao,et al.  Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.