Pixel-Guided Association for Multi-Object Tracking

Propagation and association tasks in Multi-Object Tracking (MOT) play a pivotal role in accurately linking the trajectories of moving objects. Recently, modern deep learning models have been addressing these tasks by introducing fragmented solutions for each different problem such as appearance modeling, motion modeling, and object associations. To bring unification in the MOT task, we introduce a pixel-guided approach to efficiently build the joint-detection and tracking framework for multi-object tracking. Specifically, the up-sampled multi-scale features from consecutive frames are queued to detect the object locations by using a transformer–decoder, and per-pixel distributions are utilized to compute the association matrix according to object queries. Additionally, we introduce a long-term appearance association on track features to learn the long-term association of tracks against detections to compute the similarity matrix. Finally, a similarity matrix is jointly integrated with the Byte-Tracker resulting in a state-of-the-art MOT performance. The experiments with the standard MOT15 and MOT17 benchmarks show that our approach achieves significant tracking performance.

[1]  Jiaya Jia,et al.  Tracking Objects as Pixel-wise Distributions , 2022, ECCV.

[2]  Liqing Zhang,et al.  TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Lu Yuan,et al.  Online Multi-Object Tracking with Unsupervised Re-Identification Learning and Occlusion Estimation , 2022, Neurocomputing.

[4]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jürgen Beyerer,et al.  Multi-Pedestrian Tracking with Clusters , 2021, 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[6]  Favyen Bastani,et al.  Self-Supervised Multi-Object Tracking with Cross-Input Consistency , 2021, NeurIPS.

[7]  Ping Luo,et al.  ByteTrack: Multi-Object Tracking by Associating Every Detection Box , 2021, ECCV.

[8]  Anuj Karpatne,et al.  A Graph Convolutional Neural Network Based Approach for Traffic Monitoring Using Augmented Detections with Optical Flow , 2021, 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).

[9]  Bodo Rosenhahn,et al.  Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Alexander G. Schwing,et al.  Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.

[11]  Jürgen Beyerer,et al.  Improving Multiple Pedestrian Tracking by Track Management and Occlusion Handling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kris Kitani,et al.  Joint Object Detection and Multi-Object Tracking with Graph Neural Networks , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Davide Modolo,et al.  SiamMOT: Siamese Multi-Object Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yubing Tong,et al.  Online multi-object tracking using multi-function integration and tracking simulation training , 2021, Applied Intelligence.

[15]  X. Zhang,et al.  MOTR: End-to-End Multiple-Object Tracking with TRansformer , 2021, ECCV.

[16]  Yinghui Xu,et al.  Multiple Object Tracking with Correlation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Haibin Ling,et al.  TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking , 2021, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[18]  Wolfram Burgard,et al.  Learning to Track with Object Permanence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  L. Leal-Taixé,et al.  TrackFormer: Multi-Object Tracking with Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  P. Luo,et al.  TransTrack: Multiple-Object Tracking with Transformer , 2020, ArXiv.

[21]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[22]  Bodo Rosenhahn,et al.  Lifted Disjoint Paths with Application in Multiple Object Tracking , 2020, ICML.

[23]  Trevor Darrell,et al.  Quasi-Dense Similarity Learning for Multiple Object Tracking , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Cewu Lu,et al.  TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[26]  Xinggang Wang,et al.  FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking , 2020, International Journal of Computer Vision.

[27]  Vladlen Koltun,et al.  Tracking Objects as Points , 2020, ECCV.

[28]  Zewen Li,et al.  A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Zhichao Lu,et al.  RetinaTrack: Online Single Stage Joint Detection and Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Francisco Herrera,et al.  Deep Learning in Video Multi-Object Tracking: A Survey , 2019, Neurocomputing.

[31]  R. Horaud,et al.  How to Train Your Deep Multi-Object Tracker , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Wei Wu,et al.  Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification , 2019, ArXiv.

[34]  Haibin Ling,et al.  Online Multi-Object Tracking With Instance-Aware Tracker and Dynamic Model Refreshment , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  James M. Rehg,et al.  Multi-object Tracking with Neural Gating Using Bilinear LSTM , 2018, ECCV.

[36]  Kwangjin Yoon,et al.  Online Multi-Object Tracking with Historical Appearance Matching and Scene Adaptive Detection Filtering , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[37]  Seung-Hwan Bae,et al.  Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Kwangjin Yoon,et al.  Multiple hypothesis tracking algorithm for multi-target multi-camera tracking with disjoint views , 2018, IET Image Process..

[39]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[40]  Silvio Savarese,et al.  Recurrent Autoregressive Networks for Online Multi-object Tracking , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Moongu Jeon,et al.  Joint cost minimization for multi-object tracking , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[42]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[43]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[44]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Ming-Hsuan Yang,et al.  Online Multi-object Tracking via Structural Constraint Event Aggregation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Ian D. Reid,et al.  Joint Probabilistic Data Association Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Ming-Hsuan Yang,et al.  Bayesian Multi-object Tracking Using Motion Context from Multiple Objects , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[53]  Mario Sznaier,et al.  The Way They Move: Tracking Multiple Targets with Similar Appearance , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[55]  Ning Ma,et al.  Multi-Sensor Joint Detection and Tracking with the Bernoulli Filter , 2012, IEEE Transactions on Aerospace and Electronic Systems.

[56]  H. Ai,et al.  Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[58]  D. Rus,et al.  TransCenter: Transformers with Dense Queries for Multiple-Object Tracking , 2021, ArXiv.

[59]  Hang Dong,et al.  Online Multi-Object Tracking with Structural Invariance Constraint , 2018, BMVC.

[60]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .