MTP: Multi-hypothesis Tracking and Prediction for Reduced Error Propagation

Recently, there has been tremendous progress in developing each individual module of the standard perceptionplanning robot autonomy pipeline, including detection, tracking, prediction of other agents’ trajectories, and ego-agent trajectory planning. Nevertheless, there has been less attention given to the principled integration of these components, particularly in terms of the characterization and mitigation of cascading errors. This paper addresses the problem of cascading errors by focusing on the coupling between the tracking and prediction modules. First, by using state-of-the-art tracking and prediction tools, we conduct a comprehensive experimental evaluation of how severely errors stemming from tracking can impact prediction performance. On the KITTI and nuScenes datasets, we find that predictions consuming tracked trajectories as inputs (the typical case in practice) can experience a significant (even order of magnitude) drop in performance in comparison to the idealized setting where ground truth past trajectories are used as inputs. To address this issue, we propose a multi-hypothesis tracking and prediction framework. Rather than relying on a single set of tracking results for prediction, our framework simultaneously reasons about multiple sets of tracking results, thereby increasing the likelihood of including accurate tracking results as inputs to prediction. We show that this framework improves overall prediction performance over the standard single-hypothesis tracking-prediction pipeline by up to 34.2% on the nuScenes dataset, with even more significant improvements (up to ∼70%) when restricting the evaluation to challenging scenarios involving identity switches and fragments – all with a relatively minor computation overhead. Our project page is here: https://www.xinshuoweng.com/projects/MTP.

[1]  Andreas Zell,et al.  Score refinement for confidence-based 3D multi-object tracking , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Kris Kitani,et al.  AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Kris Kitani,et al.  PTP: Parallelized Tracking and Prediction With Graph Neural Networks and Diversity Sampling , 2021, IEEE Robotics and Automation Letters.

[4]  R. Urtasun,et al.  PnPNet: End-to-End Perception and Prediction With Tracking in the Loop , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zihan Zhou,et al.  Towards Robust Human Trajectory Prediction in Raw Videos , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Brendan Englot,et al.  Robust Exploration with Multiple Hypothesis Data Association , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Marco Pavone,et al.  Rethinking Trajectory Forecasting Evaluation , 2021, ArXiv.

[9]  Jiahe Li,et al.  Graph Networks for Multiple Object Tracking , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Luc Van Gool,et al.  WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Laura Leal-Taixé,et al.  EagerMOT: 3D Multi-Object Tracking via Sensor Fusion , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Kris Kitani,et al.  Joint Object Detection and Multi-Object Tracking with Graph Neural Networks , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Hannes Sommer,et al.  Multiple Hypothesis Semantic Mapping for Robust Data Association , 2019, IEEE Robotics and Automation Letters.

[15]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[16]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[17]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Silvio Savarese,et al.  Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks , 2019, NeurIPS.

[19]  Sergey Levine,et al.  PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Marco Pavone,et al.  Trajectron++: Dynamically-Feasible Trajectory Forecasting with Heterogeneous Data , 2020, ECCV.

[21]  Marco Pavone,et al.  The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Karl Granström,et al.  Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[23]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Hui Zhou,et al.  Robust Multi-Modality Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  E. Nebot,et al.  Probabilistic Crowd GAN: Multimodal Pedestrian Trajectory Prediction Using a Graph Vehicle-Pedestrian Attention Network , 2020, IEEE Robotics and Automation Letters.

[27]  Ingemar J. Cox,et al.  An Efficient Implementation of Reid's Multiple Hypothesis Tracking Algorithm and Its Evaluation for the Purpose of Visual Tracking , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[29]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[30]  Ming Liu,et al.  PointTrackNet: An End-to-End Network For 3-D Object Detection and Tracking From Point Clouds , 2020, IEEE Robotics and Automation Letters.

[31]  Jianren Wang,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[32]  Antonios Tsourdos,et al.  Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation , 2020, Sensors.

[33]  Qi Zhang,et al.  Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting , 2020, NeurIPS.

[34]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[35]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[36]  Jianren Wang,et al.  Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting , 2020, CoRL.

[37]  Paul Vernaza,et al.  r2p2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting , 2018, ECCV.

[38]  Kris Kitani,et al.  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Sergey Levine,et al.  Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Peter Protzel,et al.  Factor Graph based 3D Multi-Object Tracking in Point Clouds , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).