TNT: Target-driveN Trajectory Prediction

Predicting the future behavior of moving agents is essential for real world applications. It is challenging as the intent of the agent and the corresponding behavior is unknown and intrinsically multimodal. Our key insight is that for prediction within a moderate time horizon, the future modes can be effectively captured by a set of target states. This leads to our target-driven trajectory prediction (TNT) framework. TNT has three stages which are trained end-to-end. It first predicts an agent's potential target states $T$ steps into the future, by encoding its interactions with the environment and the other agents. TNT then generates trajectory state sequences conditioned on targets. A final stage estimates trajectory likelihoods and a final compact set of trajectory predictions is selected. This is in contrast to previous work which models agent intents as latent variables, and relies on test-time sampling to generate diverse trajectories. We benchmark TNT on trajectory prediction of vehicles and pedestrians, where we outperform state-of-the-art on Argoverse Forecasting, INTERACTION, Stanford Drone and an in-house Pedestrian-at-Intersection dataset.

[1]  Alexander G. Schwing,et al.  Diverse Generation for Multi-Agent Sports Games , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Kris Kitani,et al.  Diverse Trajectory Forecasting with Determinantal Point Processes , 2019, ICLR.

[5]  Chen Sun,et al.  Stochastic Prediction of Multi-Agent Interactions from Partial Observations , 2019, ICLR.

[6]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Henggang Cui,et al.  Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[9]  Shaogang Gong,et al.  Group and Crowd Behavior for Computer Vision , 2017 .

[10]  Elena Corina Grigore,et al.  CoverNet: Multimodal Behavior Prediction Using Trajectory Sets , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[12]  Yisong Yue,et al.  Generative Multi-Agent Behavioral Cloning , 2018, ArXiv.

[13]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Takashi Tsubouchi,et al.  Behavior of a mobile robot navigated by an "iterated forecast and planning" scheme in the presence of multiple moving obstacles , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[15]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Yisong Yue,et al.  Generating Long-term Trajectories Using Deep Hierarchical Networks , 2016, NIPS.

[19]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ehsan Adeli,et al.  It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction , 2020, ECCV.

[21]  Kris M. Kitani,et al.  Forecasting Interactive Dynamics of Pedestrians with Fictitious Play , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dragomir Anguelov,et al.  VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Masayoshi Tomizuka,et al.  INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps , 2019, ArXiv.

[24]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Bolei Zhou,et al.  TPNet: Trajectory Proposal Network for Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ruslan Salakhutdinov,et al.  Multiple Futures Prediction , 2019, NeurIPS.

[28]  Sergey Levine,et al.  PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[30]  Eike Rehder,et al.  Goal-Directed Pedestrian Prediction , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  Martin Lauer,et al.  Pedestrian Prediction by Planning Using Deep Neural Networks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[34]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yisong Yue,et al.  Generating Multi-Agent Trajectories using Programmatic Weak Supervision , 2018, ICLR.

[36]  Lutz Eckstein,et al.  The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[37]  Paul Vernaza,et al.  r2p2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting , 2018, ECCV.

[38]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[39]  Silvio Savarese,et al.  Forecasting Social Navigation in Crowded Complex Scenes , 2016, ArXiv.

[40]  Takeo Kanade,et al.  A prediction and planning framework for road safety analysis, obstacle avoidance and driver information , 2004 .

[41]  Benjamin Sapp,et al.  Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).