论文信息 - DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following reidentification (re-ID) for object association. This pipeline is partially motivated by recent progress in both object detection and re-ID, and partially motivated by biases in existing tracking datasets, where most objects tend to have distinguishing appearance and re-ID models are sufficient for establishing associations. In response to such bias, we would like to re-emphasize that methods for multi-object tracking should also work when object appearance is not sufficiently discriminative. To this end, we propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation. As the dataset contains mostly group dancing videos, we name it “DanceTrack”. We expect DanceTrack to provide a better platform to develop more MOT algorithms that rely less on visual discrimination and depend more on motion analysis. We benchmark several state-of-the-art trackers on our dataset and observe a significant performance drop on DanceTrack when compared against existing benchmarks. The dataset, project code and competition server are released at: https://github.com/DanceTrack. * equal contribution.

[1] Yichen Wei,et al. MOTR: End-to-End Multiple-Object Tracking with TRansformer , 2021, ArXiv.

[2] Cewu Lu,et al. Cross-Domain Adaptation for Animal Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] Andreas Geiger,et al. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D , 2021, ArXiv.

[4] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[5] Andreas Geiger,et al. MOTS: Multi-Object Tracking and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Daniel Cremers,et al. MOT20: A benchmark for multi object tracking in crowded scenes , 2020, ArXiv.

[7] Stefan Roth,et al. MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[8] Simon Lucey,et al. Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Trevor Darrell,et al. Quasi-Dense Similarity Learning for Multiple Object Tracking , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Long Chen,et al. Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[11] Laura Leal-Taixe,et al. TrackFormer: Multi-Object Tracking with Transformers , 2021, ArXiv.

[12] Dongdong Yu,et al. ByteTrack: Multi-Object Tracking by Associating Every Detection Box , 2021, ArXiv.

[13] Philip H. S. Torr,et al. HOTA: A Higher Order Metric for Evaluating Multi-object Tracking , 2020, International Journal of Computer Vision.

[14] Trevor Darrell,et al. Instance-Aware Predictive Navigation in Multi-Agent Environments , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15] Xingyi Zhou,et al. Objects as Points , 2019, ArXiv.

[16] M. Shah,et al. Object tracking: A survey , 2006, CSUR.

[17] F Gustafsson,et al. Particle filter theory and practice with positioning applications , 2010, IEEE Aerospace and Electronic Systems Magazine.

[18] Rainer Stiefelhagen,et al. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[19] Zhi Tian,et al. BoxInst: High-Performance Instance Segmentation with Box Annotations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Deva Ramanan,et al. TAO: A Large-Scale Benchmark for Tracking Any Object , 2020, ECCV.

[21] Francesco Solera,et al. Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[22] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[23] Ning Xu,et al. YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark , 2018, ArXiv.

[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[25] Wenjun Zeng,et al. FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. , 2020 .

[26] Nicu Sebe,et al. Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[27] Laura Leal-Taixé,et al. Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] J. Ross Beveridge,et al. DEFT: Detection Embeddings for Tracking , 2021, ArXiv.

[29] Junsong Yuan,et al. Track to Detect and Segment: An Online Multi-Object Tracker , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Yichen Wei,et al. Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31] Toby P. Breckon,et al. Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Mohan M. Trivedi,et al. No Blind Spots: Full-Surround Multi-Object Tracking for Autonomous Vehicles Using Cameras and LiDARs , 2018, IEEE Transactions on Intelligent Vehicles.

[33] Rooji Jinan,et al. Particle Filters for Multiple Target Tracking , 2016 .

[34] Nuno Vasconcelos,et al. Bidirectional Learning for Domain Adaptation of Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Fabio Tozeto Ramos,et al. Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[36] Dragomir Anguelov,et al. Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Vladlen Koltun,et al. Tracking Objects as Points , 2020, ECCV.

[38] Shengjin Wang,et al. Towards Real-Time Multi-Object Tracking , 2019, ECCV.

[39] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[40] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41] Zeming Li,et al. YOLOX: Exceeding YOLO Series in 2021 , 2021, ArXiv.

[42] J. Ferryman,et al. PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[43] Dietrich Paulus,et al. Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[44] Luc Van Gool,et al. WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] P. Luo,et al. TransTrack: Multiple-Object Tracking with Transformer , 2020, ArXiv.

[46] Cewu Lu,et al. Pairwise Body-Part Attention for Recognizing Human-Object Interactions , 2018, ECCV.

[47] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.