Learnable Online Graph Representations for 3D Multi-Object Tracking

Tracking of objects in 3D is a fundamental task in computer vision that finds use in a wide range of applications such as autonomous driving, robotics or augmented reality. Most recent approaches for 3D multi object tracking (MOT) from LIDAR use object dynamics together with a set of handcrafted features to match detections of objects. However, manually designing such features and heuristics is cumbersome and often leads to suboptimal performance. In this work, we instead strive towards a unified and learning based approach to the 3D MOT problem. We design a graph structure to jointly process detection and track states in an online manner. To this end, we employ a Neural Message Passing network for data association that is fully trainable. Our approach provides a natural way for track initialization and handling of false positive detections, while significantly improving track stability. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.

[1]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Hui Zhou,et al.  Robust Multi-Modality Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Benjamin Sapp,et al.  Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[8]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[9]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[11]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[12]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Ameya Prabhu,et al.  Simple Unsupervised Multi-Object Tracking , 2020, ArXiv.

[14]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Kris Kitani,et al.  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[19]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[20]  Daniel Cremers,et al.  MOT20: A benchmark for multi object tracking in crowded scenes , 2020, ArXiv.

[21]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[22]  Benjin Zhu,et al.  Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection , 2019, ArXiv.

[23]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jie Li,et al.  Probabilistic 3D Multi-Object Tracking for Autonomous Driving , 2020, ArXiv.

[25]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Luc Van Gool,et al.  Action Sequence Predictions of Vehicles in Urban Environments using Map and Social Context , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Jianren Wang,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[28]  Luc Van Gool,et al.  Customized Multi-person Tracker , 2018, ACCV.

[29]  Laura Leal-Taix'e,et al.  Learning a Neural Solver for Multiple Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).