Looking to Relations for Future Trajectory Forecast

Inferring relational behavior between road users as well as road users and their surrounding physical space is an important step toward effective modeling and prediction of navigation strategies adopted by participants in road scenes. To this end, we propose a relation-aware framework for future trajectory forecast. Our system aims to infer relational information from the interactions of road users with each other and with the environment. The first module involves visual encoding of spatio-temporal features, which captures human-human and human-space interactions over time. The following module explicitly constructs pair-wise relations from spatio-temporal interactions and identifies more descriptive relations that highly influence future motion of the target road user by considering its past trajectory. The resulting relational features are used to forecast future locations of the target, in the form of heatmaps with an additional guidance of spatial dependencies and consideration of the uncertainty. Extensive evaluations on the public benchmark datasets demonstrate the robustness and efficacy of the proposed framework as observed by performances higher than the state-of-the-art methods.

[1]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[2]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[3]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Yu Yao,et al.  Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Fawzi Nashashibi,et al.  Real time trajectory prediction for collision risk estimation between vehicles , 2009, 2009 IEEE 5th International Conference on Intelligent Computer Communication and Processing.

[8]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[10]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[11]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[13]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Jianbo Shi,et al.  Predicting Behaviors of Basketball Players from First Person Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[16]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[18]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Silvio Savarese,et al.  Single-source Attention Path Prediction Multi-source Attention Predicted Observed , 2018 .

[20]  Sujitha Martin,et al.  Goal-oriented Object Importance Estimation in On-road Driving Videos , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[21]  Xiaogang Wang,et al.  Understanding pedestrian behaviors from stationary crowd groups , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[23]  Shenghua Gao,et al.  Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[25]  Jianbo Shi,et al.  Egocentric Future Localization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[27]  Nico Steinhardt,et al.  Intelligent Traffic Flow Assist: Optimized Highway Driving Using Conditional Behavior Prediction , 2020 .

[28]  Soshi Iba,et al.  Dynamic Channel: A Planning Framework for Crowd Navigation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[29]  Alex Pentland,et al.  Human Computing and Machine Understanding of Human Behavior: A Survey , 2007, Artifical Intelligence for Human Computing.

[30]  Mark Reynolds,et al.  SS-LSTM: A Hierarchical LSTM Model for Pedestrian Trajectory Prediction , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Li-Chen Fu,et al.  Human-Centered Robot Navigation—Towards a Harmoniously Human–Robot Coexisting Environment , 2011, IEEE Transactions on Robotics.

[32]  Ivan Laptev,et al.  Data-driven crowd analysis in videos , 2011, ICCV.

[33]  Rachid Alami,et al.  Human-aware robot navigation: A survey , 2013, Robotics Auton. Syst..

[34]  Yun Fu,et al.  Human Action Recognition and Prediction: A Survey , 2018, International Journal of Computer Vision.

[35]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[36]  Kris M. Kitani,et al.  Forecasting Interactive Dynamics of Pedestrians with Fictitious Play , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[38]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[39]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[40]  John K. Tsotsos,et al.  Joint Attention in Driver-Pedestrian Interaction: from Theory to Practice , 2018, ArXiv.

[41]  Yoichi Sato,et al.  Future Person Localization in First-Person Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[44]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[45]  Qiuming Zhu,et al.  Hidden Markov model for dynamic obstacle avoidance of mobile robot navigation , 1991, IEEE Trans. Robotics Autom..

[46]  John M. Dolan,et al.  Motion planning under uncertainty for on-road autonomous driving , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Alessio Del Bue,et al.  MX-LSTM: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).