Stochastic Prediction of Multi-Agent Interactions from Partial Observations

We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. Our method is based on a graph-structured variational recurrent neural network (Graph-VRNN), which is trained end-to-end to infer the current state of the (partially observed) world, as well as to forecast future states. We show that our method outperforms various baselines on two sports datasets, one based on real basketball trajectories, and one generated by a soccer game engine.

[1]  Fabio Viola,et al.  Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.

[2]  Kris M. Kitani,et al.  Activity Forecasting: An Invitation to Predictive Perception , 2017, Group and Crowd Behavior for Computer Vision.

[3]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[6]  Hamid Abrishami Moghaddam,et al.  A survey on player tracking in soccer videos , 2017, Comput. Vis. Image Underst..

[7]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[8]  Sergey Levine,et al.  Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[9]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[10]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[12]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[13]  Niloy J. Mitra,et al.  Taking Visual Motion Prediction To New Heightfields , 2019, Comput. Vis. Image Underst..

[14]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[15]  Yaakov Bar-Shalom,et al.  A note on "book review tracking and data fusion: A handbook of algorithms" [Authors' reply] , 2013 .

[16]  Antonio Torralba,et al.  Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[18]  Ira Kemelmacher-Shlizerman,et al.  Soccer on Your Tabletop , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[20]  Thomas B. Moeslund,et al.  Identifying Basketball Plays from Sensor Data; Towards a Low-Cost Automatic Extraction of Advanced Statistics , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[21]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[22]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[23]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[24]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[25]  Jitendra Malik,et al.  What will Happen Next? Forecasting Player Moves in Sports Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[27]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[28]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[29]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Yisong Yue,et al.  Generating Multi-Agent Trajectories using Programmatic Weak Supervision , 2018, ICLR.

[32]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[33]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[34]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[35]  Dushyant Rao,et al.  Deep tracking in the wild: End-to-end tracking using recurrent neural networks , 2018, Int. J. Robotics Res..

[36]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[37]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[38]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[39]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[40]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[41]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[42]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yoshua Bengio,et al.  Z-Forcing: Training Stochastic Recurrent Networks , 2017, NIPS.

[44]  Yisong Yue,et al.  Generative Multi-Agent Behavioral Cloning , 2018, ArXiv.

[45]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[47]  Yedid Hoshen,et al.  VAIN: Attentional Multi-agent Predictive Modeling , 2017, NIPS.

[48]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[49]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[50]  Chen Sun,et al.  Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification , 2017, ECCV.