A Robust MTMC Tracking System for AI-City Challenge 2021

Multi-Target Multi-Camera tracking (MTMC) is an essential task in the intelligent city and traffic analysis. It is a great challenging task due to several problems such as heavy occlusions and appearance variance caused by various camera perspectives and congested vehicles. In this paper, we propose a practical framework for dealing with the MTMC problem. The proposed framework contains three stage. Firstly, in the vehicles detection and Re-ID stage, the proposed system leverages Cascade R-CNN to detect all vehicles and extract appearance features with a Re-ID module for all cameras. Secondly, in the Multi-Target Single-Camera tracking (MTSC) stage, on the basis of the detected boxes and appearance features, it tracks multiple vehicles to generate candidate trajectories within each single camera with Tracklet-Plane Matching (TPM) tracking algorithm. Finally, in the Inter-Camera Association (ICA) stage, it associates all candidate trajectories between two successive cameras using the established distance matrix, and combines all successively matching results for final submission. The established distance matrix is simply computed by the Re-ID features and refined by the constraints of traveling time, road structures, and traffic rules to accelerate matching time as well as reduce search space. Extensive experiments on the public track3 test set of NVIDIA AI CITY 2021 CHALLENGE demonstrate the effectiveness of our method, which achieves IDF1 of 77.87%.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Longhui Wei,et al.  Person Transfer GAN to Bridge Domain Gap for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Jian Wang,et al.  TPM: Multiple object tracking with tracklet-plane matching , 2020, Pattern Recognit..

[4]  Xiaogang Wang,et al.  Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[6]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Adam Herout,et al.  Vehicle Re-Identifiation and Multi-Camera Tracking in Challenging City-Scale Environment , 2019, CVPR Workshops.

[8]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[9]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  Yichen Wei,et al.  Circle Loss: A Unified Perspective of Pair Similarity Optimization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[13]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[14]  Chang Huang,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Jenq-Neng Hwang,et al.  Single-Camera and Inter-Camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[18]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[19]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[20]  Jenq-Neng Hwang,et al.  CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  P. Luo,et al.  TransTrack: Multiple-Object Tracking with Transformer , 2020, ArXiv.

[22]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[23]  Shuo Wang,et al.  PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[25]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jenq-Neng Hwang,et al.  Combined estimation of camera link models for human tracking across nonoverlapping cameras , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Yu Liu,et al.  POI: Multiple Object Tracking with High Performance Detection and Appearance Feature , 2016, ECCV Workshops.

[28]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  M. Naphade,et al.  Simulating Content Consistent Vehicle Datasets with Attribute Descent , 2019, ECCV.

[32]  Tetsuya Sakai,et al.  Arc Loss: Softmax with Additive Angular Margin for Answer Retrieval , 2019, AIRS.

[33]  Wei Wu,et al.  Multi-Camera Vehicle Tracking with Powerful Visual Features and Spatial-Temporal Cue , 2019, CVPR Workshops.

[34]  Yi Yang,et al.  A Discriminatively Learned CNN Embedding for Person Reidentification , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  C. Wojek,et al.  D Traffic Scene Understanding from Movable Platforms , 2013 .

[38]  Jenq-Neng Hwang,et al.  Multi-Camera Tracking of Vehicles based on Deep Features Re-ID and Trajectory-Based Camera Link Models , 2019, CVPR Workshops.

[39]  Andrea Palazzi,et al.  Unsupervised Vehicle Re-identification Using Triplet Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Vladlen Koltun,et al.  Tracking Objects as Points , 2020, ECCV.

[41]  Wenjun Zeng,et al.  FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. , 2020 .

[42]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Zhedong Zheng,et al.  Joint Discriminative and Generative Learning for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Andreas Geiger,et al.  Understanding High-Level Semantics by Modeling Traffic Patterns , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[46]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[47]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Xiao Tan,et al.  Multi-camera vehicle tracking and re-identification based on visual and spatial-temporal features , 2019, CVPR Workshops.

[49]  Alexander G. Hauptmann,et al.  ELECTRICITY: An Efficient Multi-camera Vehicle Tracking System for Intelligent City , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50]  Liang Zheng,et al.  A Locality Aware City-Scale Multi-Camera Vehicle Tracking System , 2019, CVPR Workshops.

[51]  Laura Leal-Taixe,et al.  TrackFormer: Multi-Object Tracking with Transformers , 2021, ArXiv.