Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

Predicting the future trajectories of multiple interacting agents in a scene has become an increasingly important problem for many different applications ranging from control of autonomous vehicles and social robots to security and surveillance. This problem is compounded by the presence of social interactions between humans and their physical interactions with the scene. While the existing literature has explored some of these cues, they mainly ignored the multimodal nature of each human's future trajectory. In this paper, we present Social-BiGAT, a graph-based generative adversarial network that generates realistic, multimodal trajectory predictions by better modelling the social interactions of pedestrians in a scene. Our method is based on a graph attention network (GAT) that learns reliable feature representations that encode the social interactions between humans in the scene, and a recurrent encoder-decoder architecture that is trained adversarially to predict, based on the features, the humans' paths. We explicitly account for the multimodal nature of the prediction problem by forming a reversible transformation between each scene and its latent noise vector, as in Bicycle-GAN. We show that our framework achieves state-of-the-art performance comparing it to several baselines on existing trajectory forecasting benchmarks.

[1]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[2]  Silvio Savarese,et al.  Knowledge Transfer for Scene-Specific Motion Prediction , 2016, ECCV.

[3]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[4]  Julia Kantorovitch,et al.  An assistive household robot – doing more than just cleaning , 2014 .

[5]  Sridha Sridharan,et al.  Tree Memory Networks for Modelling Long-term Temporal Dependencies , 2017, Neurocomputing.

[6]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[7]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[8]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[9]  Kris M. Kitani,et al.  Forecasting Interactive Dynamics of Pedestrians with Fictitious Play , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ramin Mehran,et al.  Abnormal crowd behavior detection using social force model , 2009, CVPR.

[11]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[12]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[13]  Julien Pettré,et al.  Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories With GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Wolfgang Hübner,et al.  Particle-based Pedestrian Path Prediction using LSTM-MDL Models , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[15]  James L. Regens,et al.  Video surveillance and counterterrorism: the application of suspicious activity recognition in visual surveillance systems to counterterrorism , 2014 .

[16]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[19]  Pierre-Yves Oudeyer,et al.  Autonomous exploration, active learning and human guidance with open-source Poppy humanoid robot platform and Explauto library , 2016, NIPS 2016.

[20]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[23]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Mohan M. Trivedi,et al.  A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Silvio Savarese,et al.  Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[27]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[28]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Luc Van Gool,et al.  Improving Data Association by Joint Modeling of Pedestrian Trajectories and Groupings , 2010, ECCV.

[31]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Alessio Del Bue,et al.  "Seeing is Believing": Pedestrian Trajectory Forecasting Using Visual Frustum of Attention , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Silvio Savarese,et al.  CAR-Net: Clairvoyant Attentive Recurrent Network , 2017, ECCV.

[34]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[35]  Mubarak Shah,et al.  Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Alberto Del Bimbo,et al.  Context-Aware Trajectory Prediction , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[37]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).