SoDA: Multi-Object Tracking with Soft Data Association

Robust multi-object tracking (MOT) is a prerequisite fora safe deployment of self-driving cars. Tracking objects, however, remains a highly challenging problem, especially in cluttered autonomous driving scenes in which objects tend to interact with each other in complex ways and frequently get occluded. We propose a novel approach to MOT that uses attention to compute track embeddings that encode the spatiotemporal dependencies between observed objects. This attention measurement encoding allows our model to relax hard data associations, which may lead to unrecoverable errors. Instead, our model aggregates information from all object detections via soft data associations. The resulting latent space representation allows our model to learn to reason about occlusions in a holistic data-driven way and maintain track estimates for objects even when they are occluded. Our experimental results on the Waymo OpenDataset suggest that our approach leverages modern large-scale datasets and performs favorably compared to the state of the art in visual multi-object tracking.

[1]  Andreas Geiger,et al.  FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[5]  Pascal Fua,et al.  Tracking Interacting Objects Using Intertwined Flows , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Bastian Leibe,et al.  Combined image- and world-space tracking in traffic scenes , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  K. Madhava Krishna,et al.  Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Raquel Urtasun,et al.  End-to-end Learning of Multi-sensor 3D Tracking by Detection , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Andreas Geiger,et al.  MOTS: Multi-Object Tracking and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[16]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Margrit Betke,et al.  Coupling detection and data association for multiple object tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Wongun Choi,et al.  Deep Network Flow for Multi-object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Bohyung Han,et al.  Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[22]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[23]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[24]  Yu-Wing Tai,et al.  Accurate Single Stage Detector Using Recurrent Rolling Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[27]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[30]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Haibin Ling,et al.  FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[33]  Long Chen,et al.  Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[34]  Nenghai Yu,et al.  Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[36]  Ingemar J. Cox,et al.  An Efficient Implementation of Reid's Multiple Hypothesis Tracking Algorithm and Its Evaluation for the Purpose of Visual Tracking , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Chen Sun,et al.  Stochastic Prediction of Multi-Agent Interactions from Partial Observations , 2019, ICLR.

[38]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[39]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[40]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[41]  James M. Rehg,et al.  Multi-object Tracking with Neural Gating Using Bilinear LSTM , 2018, ECCV.

[42]  Bernt Schiele,et al.  Multi-person Tracking by Multicut and Deep Matching , 2016, ECCV Workshops.

[43]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[44]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[45]  Bernt Schiele,et al.  Subgraph decomposition for multi-target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Trevor Darrell,et al.  Joint Monocular 3D Vehicle Detection and Tracking , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Pascal Fua,et al.  Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[49]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[52]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[53]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.