论文信息 - Track to Detect and Segment: An Online Multi-Object Tracker

Track to Detect and Segment: An Online Multi-Object Tracker

Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking. In this paper, we present a new online joint detection and tracking model, TraDeS (TRAck to DEtect and Segment), exploiting tracking clues to assist detection end-to-end. TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features for improving current object detection and segmentation. Effectiveness and superiority of TraDeS are shown on 4 datasets, including MOT (2D tracking), nuScenes (3D tracking), MOTS and Youtube-VIS (instance segmentation tracking). Project page: https://jialianwu.com/projects/TraDeS.html.

[1] Wei Zhang,et al. Segment as Points for Efficient Online Multi-Object Tracking and Segmentation , 2020, ECCV.

[2] Stephen Lin,et al. DPSNet: End-to-end Deep Plane Sweep Stereo , 2019, ICLR.

[3] L. Leal-Taixé,et al. TrackFormer: Multi-Object Tracking with Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Junliang Xing,et al. Online Multi-Target Tracking with Tensor-Based High-Order Graph Matching , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[5] P. Luo,et al. TransTrack: Multiple-Object Tracking with Transformer , 2020, ArXiv.

[6] Jia Xu,et al. Accurate Optical Flow via Direct Cost Volume Processing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Jianbo Shi,et al. Learning Temporal Pose Estimation from Sparsely-Labeled Videos , 2019, NeurIPS.

[8] Junjie Yan,et al. Multiple Target Tracking Based on Undirected Hierarchical Relation Hypergraph , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9] J. Serrat,et al. Learning Multi-Object Tracking and Segmentation From Automatic Annotations , 2019, Computer Vision and Pattern Recognition.

[10] Ming Yang,et al. Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Rainer Stiefelhagen,et al. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[12] Yuchen Fan,et al. Video Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Trevor Darrell,et al. Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Jiong Yang,et al. PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Qing Zhao,et al. Multi-Object Tracking Using Online Metric Learning with Long Short-Term Memory , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[16] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] David Held,et al. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18] Aggelos K. Katsaggelos,et al. Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] Davide Modolo,et al. Combining Detection and Tracking for Human Pose Estimation in Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Houqiang Li,et al. Single Shot Video Object Detector , 2020, IEEE Transactions on Multimedia.

[22] Laura Leal-Taixé,et al. Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Songtao Liu,et al. Learning Spatial Fusion for Single-Shot Object Detection , 2019, ArXiv.

[24] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Stephen Lin,et al. Integrated Object Detection and Tracking with Tracklet-Conditioned Detection , 2018, ArXiv.

[26] Jianbo Shi,et al. Object Detection in Video with Spatiotemporal Sampling Networks , 2018, ECCV.

[27] James M. Rehg,et al. Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28] Xingyi Zhou,et al. Objects as Points , 2019, ArXiv.

[29] Vladlen Koltun,et al. Tracking Objects as Points , 2020, ECCV.

[30] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[31] L. Leal-Taix'e,et al. Learning a Neural Solver for Multiple Object Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Bernt Schiele,et al. Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] J. Álvarez,et al. Cost Volume Pyramid Based Depth Inference for Multi-View Stereo , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Bastian Leibe,et al. FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Hao Chen,et al. Conditional Convolutions for Instance Segmentation , 2020, ECCV.

[37] Silvio Savarese,et al. Recurrent Autoregressive Networks for Online Multi-object Tracking , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38] Dietrich Paulus,et al. Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[39] Wongun Choi,et al. Deep Network Flow for Multi-object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Xinggang Wang,et al. FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking , 2020, International Journal of Computer Vision.

[41] Andrea Simonelli,et al. Disentangling Monocular 3D Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Mubarak Shah,et al. Deep Affinity Network for Multiple Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Ruigang Yang,et al. A Unified Object Motion and Affinity Model for Online Multi-Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Shengjin Wang,et al. Towards Real-Time Multi-Object Tracking , 2019, ECCV.

[46] Fabio Tozeto Ramos,et al. Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[47] Stefan Roth,et al. MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[48] Fahad Shahbaz Khan,et al. SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation , 2020, ECCV.

[49] Kris Kitani,et al. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Yu Liu,et al. POI: Multiple Object Tracking with High Performance Detection and Appearance Feature , 2016, ECCV Workshops.

[51] Feiyue Huang,et al. Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking , 2020, ECCV.

[52] Zhichao Lu,et al. RetinaTrack: Online Single Stage Joint Detection and Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Cewu Lu,et al. TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55] Andreas Geiger,et al. MOTS: Multi-Object Tracking and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Kris Kitani,et al. Joint Object Detection and Multi-Object Tracking with Graph Neural Networks , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[57] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Jian Yang,et al. Person Search via A Mask-Guided Two-Stream CNN Model , 2018, ECCV.

[59] Vladlen Koltun,et al. Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Robert T. Collins,et al. A space-sweep approach to true multi-image matching , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61] Andrew Zisserman,et al. Detect to Track and Track to Detect , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62] Xiaogang Wang,et al. Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Hua Yang,et al. Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[64] Enkhbayar Erdenee,et al. Multi-class Multi-object Tracking Using Changing Point Detection , 2016, ECCV Workshops.