Online Multiple Object Tracking with Cross-Task Synergy

Modern online multiple object tracking (MOT) methods usually focus on two directions to improve tracking performance. One is to predict new positions in an incoming frame based on tracking information from previous frames, and the other is to enhance data association by generating more discriminative identity embeddings. Some works combined both directions within one framework but handled them as two individual tasks, thus gaining little mutual benefits. In this paper, we propose a novel unified model with synergy between position prediction and embedding association. The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model. Specifically, the attention modules can make the prediction focus more on targets and less on distractors, therefore more reliable embeddings can be extracted accordingly for association. On the other hand, such reliable embeddings can boost identity-awareness through memory aggregation, hence strengthen attention modules and suppress drifts. In this way, the synergy between position prediction and embedding association is achieved, which leads to strong robustness to occlusions. Extensive experiments demonstrate the superiority of our proposed model over a wide range of existing methods on MOTChallenge benchmarks. Our code and models are publicly available at https://github.com/songguocode/TADAM.

[1]  Jiandong Tian,et al.  Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets , 2020, IEEE Transactions on Image Processing.

[2]  Jian Wang,et al.  TPM: Multiple object tracking with tracklet-plane matching , 2020, Pattern Recognit..

[3]  Bin Liu,et al.  GSM: Graph Similarity Model for Multi-Object Tracking , 2020, IJCAI.

[4]  Bodo Rosenhahn,et al.  Lifted Disjoint Paths with Application in Multiple Object Tracking , 2020, ICML.

[5]  Ruigang Yang,et al.  A Unified Object Motion and Affinity Model for Online Multi-Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Long Lan,et al.  Semi-online Multi-people Tracking by Re-identification , 2020, International Journal of Computer Vision.

[7]  L. Leal-Taix'e,et al.  Learning a Neural Solver for Multiple Object Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  R. Horaud,et al.  How to Train Your Deep Multi-Object Tracker , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yang Zhang,et al.  Iterative Multiple Hypothesis Tracking With Tracklet-Level Association , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Bodo Rosenhahn,et al.  Multiple People Tracking Using Body and Joint Detections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Ling Shao,et al.  Multiobject Tracking by Submodular Optimization , 2019, IEEE Transactions on Cybernetics.

[12]  Yue Cao,et al.  Spatial-Temporal Relation Networks for Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Haibin Ling,et al.  FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Nanning Zheng,et al.  SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Wei Wu,et al.  Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification , 2019, ArXiv.

[17]  Haibin Ling,et al.  Online Multi-Object Tracking With Instance-Aware Tracker and Dynamic Model Refreshment , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[19]  Wei Xu,et al.  Tracklet Association Tracker: An End-to-End Learning-based Association Approach for Multi-Object Tracking , 2018, ArXiv.

[20]  Long Chen,et al.  Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[21]  Liang Xiao,et al.  Multi-Object Tracking with Correlation Filter for Autonomous Vehicle , 2018, Sensors.

[22]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Wen Gao,et al.  Interacting Tracklets for Multi-Object Tracking , 2018, IEEE Transactions on Image Processing.

[24]  Sridha Sridharan,et al.  Tracking by Prediction: A Deep Generative Model for Mutli-person Localisation and Tracking , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[25]  Jianhua Hou,et al.  Multiple Target Tracking by Learning Feature Representation and Distance Metric Jointly , 2018, ArXiv.

[26]  Andrew Zisserman,et al.  Detect to Track and Track to Detect , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Pascal Fua,et al.  Non-Markovian Globally Consistent Multi-object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Lu Wang,et al.  Online multiple object tracking via flow and convolutional features , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[29]  Volker Eiselein,et al.  High-Speed tracking-by-detection without using image information , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[30]  Xianming Liu,et al.  Greedy Batch-Based Minimum-Cost Flows for Tracking Multiple Objects , 2017, IEEE Transactions on Image Processing.

[31]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Wongun Choi,et al.  Deep Network Flow for Multi-object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[34]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[37]  Pascal Fua,et al.  Tracking Interacting Objects Using Intertwined Flows , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[39]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[40]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Zhenyu He,et al.  Connected Component Model for Multi-Object Tracking , 2016, IEEE Transactions on Image Processing.

[43]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[44]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[47]  Pascal Fua,et al.  What Players do with the Ball: A Physically Constrained Interaction Modeling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ian D. Reid,et al.  Joint Probabilistic Data Association Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Pascal Fua,et al.  Tracking Interacting Objects Optimally Using Integer Programming , 2014, ECCV.

[53]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[54]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[55]  Pascal Fua,et al.  Take your eyes off the ball: Improving ball-tracking by focusing on team play , 2014, Comput. Vis. Image Underst..

[56]  John W. Fisher,et al.  Topology-Constrained Layered Tracking with Latent Flow , 2013, 2013 IEEE International Conference on Computer Vision.

[57]  Huadong Meng,et al.  A Multiple Hypothesis Tracking Method for Extended Target Tracking , 2010, 2010 International Conference on Electrical and Control Engineering.

[58]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Michael Beetz,et al.  Tracking humans interacting with the environment using efficient hierarchical sampling and layered observation models , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[60]  Xin Li,et al.  Layered Representation for Pedestrian Detection and Tracking in Infrared Imagery , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[61]  S. Shankar Sastry,et al.  Markov Chain Monte Carlo Data Association for Multiple-Target Tracking , 2005, CDC 2005.

[62]  S.S. Blackman,et al.  Multiple hypothesis tracking for multiple target tracking , 2004, IEEE Aerospace and Electronic Systems Magazine.

[63]  Yaakov Bar-Shalom,et al.  Multi-target tracking using joint probabilistic data association , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.