Joint Detection and Multi-Object Tracking with Graph Neural Networks

Object detection and data association are critical components in multi-object tracking (MOT) systems. Despite the fact that these two components are highly dependent on each other, one popular trend in MOT is to perform detection and data association as separate modules, processed in a cascaded order. Due to this cascaded process, the resulting MOT system can only perform forward inference and cannot back-propagate error through the entire pipeline and correct them. This leads to sub-optimal performance over the total pipeline. To address this issue, recent work jointly optimizes detection and data association and forms an integrated MOT approach, which has been shown to improve performance in both detection and tracking. In this work, we propose a new approach for joint MOT based on Graph Neural Networks (GNNs). The key idea of our approach is that GNNs can explicitly model complex interactions between multiple objects in both the spatial and temporal domains, which is essential for learning discriminative features for detection and data association. We also leverage the fact that motion features are useful for MOT when used together with appearance features. So our proposed joint MOT approach also incorporates appearance and motion features within our graph-based feature learning framework, leading to better feature learning for MOT. Through extensive experiments on the MOT challenge dataset, we show that our proposed method achieves state-of-the-art performance on both object detection and MOT.

[1]  Qing Zhao,et al.  Multi-Object Tracking Using Online Metric Learning with Long Short-Term Memory , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[2]  Kris Kitani,et al.  When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous , 2020, ArXiv.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Kris Kitani,et al.  A Baseline for 3D Multi-Object Tracking , 2019, ArXiv.

[6]  Chong Wang,et al.  Attention-based Graph Neural Network for Semi-supervised Learning , 2018, ArXiv.

[7]  Yu Liu,et al.  POI: Multiple Object Tracking with High Performance Detection and Appearance Feature , 2016, ECCV Workshops.

[8]  Eshed Ohn-Bar,et al.  Forecasting Time-to-Collision from Monocular Video: Feasibility, Dataset, and Challenges , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[10]  Yuhao Xu,et al.  A unified neural network for object detection, multiple object tracking and vehicle re-identification , 2019, ArXiv.

[11]  Liang Zheng,et al.  Towards Real-Time Multi-Object Tracking , 2020, ECCV.

[12]  Long Chen,et al.  Online multi-object tracking with convolutional neural networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Junliang Xing,et al.  Online Multi-Target Tracking with Tensor-Based High-Order Graph Matching , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[15]  Zhengyi Luo,et al.  Learning Shape Representations for Clothing Variations in Person Re-Identification , 2020, ArXiv.

[16]  Pascal Fua,et al.  Non-Markovian Globally Consistent Multi-object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Wei Wu,et al.  Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification , 2019, ArXiv.

[18]  Philip H. S. Torr,et al.  Dual Graph Convolutional Network for Semantic Segmentation , 2019, BMVC.

[19]  Jianren Wang,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[20]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[21]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[22]  Jinjun Wang,et al.  Frame-wise Motion and Appearance for Real-time Multiple Object Tracking , 2019, ArXiv.

[23]  Krzysztof Czarnecki,et al.  FANTrack: 3D Multi-Object Tracking with Feature Association Network , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[24]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[25]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[26]  Jianren Wang,et al.  Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting , 2020, CoRL.

[27]  Haibin Ling,et al.  FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Fabio Poiesi,et al.  Online Multi-target Tracking with Strong and Weak Detections , 2016, ECCV Workshops.

[29]  Afshin Dehghan,et al.  GMMCP tracker: Globally optimal Generalized Maximum Multi Clique problem for multiple object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Wenjun Zeng,et al.  FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. , 2020 .

[33]  Claudio Gennaro,et al.  Virtual to Real Adaptation of Pedestrian Detectors , 2020, Sensors.

[34]  Alberto Ferreira de Souza,et al.  Self-Driving Cars: A Survey , 2019, Expert Syst. Appl..

[35]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[36]  Bernt Schiele,et al.  Multi-person Tracking by Multicut and Deep Matching , 2016, ECCV Workshops.

[37]  Ming-Hsuan Yang,et al.  Bayesian Multi-object Tracking Using Motion Context from Multiple Objects , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[38]  Yong Jae Lee,et al.  Video Object Detection with an Aligned Spatial-Temporal Memory , 2017, ECCV.

[39]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[40]  Mohammad Rahmati,et al.  Multi-target tracking using CNN-based features: CNNMTT , 2018, Multimedia Tools and Applications.

[41]  Seung-Hwan Bae,et al.  Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[44]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[45]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[46]  Qi Tian,et al.  Person Re-identification in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Cewu Lu,et al.  TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[49]  Wongun Choi,et al.  Deep Network Flow for Multi-object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Takeo Kanade,et al.  Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator , 2016, ArXiv.

[53]  Mohamed R. Amer,et al.  Multiobject tracking as maximum weight independent set , 2011, CVPR 2011.

[54]  Yichen Wei,et al.  Towards High Performance Video Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[56]  Gang Wang,et al.  Graininess-Aware Deep Feature Learning for Pedestrian Detection , 2018, ECCV.

[57]  Andreas Geiger,et al.  MOTS: Multi-Object Tracking and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[59]  Senthil Yogamani,et al.  RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving , 2019, ArXiv.

[60]  Bodo Rosenhahn,et al.  Lifted Disjoint Paths with Application in Multiple Object Tracking , 2020, ICML.

[61]  Zhedong Zheng,et al.  Joint Discriminative and Generative Learning for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[63]  Kris Kitani,et al.  Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[64]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[65]  Mubarak Shah,et al.  Deep Affinity Network for Multiple Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Zuozhuo Dai,et al.  Batch DropBlock Network for Person Re-Identification and Beyond , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Jiahe Li,et al.  Graph Networks for Multiple Object Tracking , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[69]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Alexander Carballo,et al.  A Survey of Autonomous Driving: Common Practices and Emerging Technologies , 2019, IEEE Access.

[71]  Francisco Herrera,et al.  Deep Learning in Video Multi-Object Tracking: A Survey , 2019, Neurocomputing.

[72]  Sen Wang,et al.  Deep Reinforcement Learning for Autonomous Driving , 2018, ArXiv.

[73]  Kris Kitani,et al.  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Pietro Perona,et al.  Pedestrian detection: A benchmark , 2009, CVPR.

[75]  Andrew Zisserman,et al.  Detect to Track and Track to Detect , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[76]  Feiyue Huang,et al.  Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking , 2020, ECCV.

[77]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[78]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[79]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[81]  Shuicheng Yan,et al.  Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Bodo Rosenhahn,et al.  Multiple People Tracking Using Body and Joint Detections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[83]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[84]  Silvio Savarese,et al.  Recurrent Autoregressive Networks for Online Multi-object Tracking , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[85]  Kuk-Jin Yoon,et al.  Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[87]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[88]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[89]  Han-Ul Kim,et al.  CDT: Cooperative Detection and Tracking for Tracing Multiple Objects in Video Sequences , 2016, ECCV.

[90]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[92]  Daniel Cremers,et al.  MOT20: A benchmark for multi object tracking in crowded scenes , 2020, ArXiv.

[93]  Han Wang,et al.  Multiple Object Tracking With Attention to Appearance, Structure, Motion and Size , 2019, IEEE Access.

[94]  Kris M. Kitani,et al.  Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling , 2020, ArXiv.

[95]  Jianren Wang,et al.  Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting , 2020, ArXiv.

[96]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[97]  Jenq-Neng Hwang,et al.  Exploit the Connectivity: Multi-Object Tracking with TrackletNet , 2018, ACM Multimedia.

[98]  Keita Higuchi,et al.  BBeep: A Sonic Collision Avoidance System for Blind Travellers and Nearby Pedestrians , 2019, CHI.

[99]  Laura Leal-Taix'e,et al.  Learning a Neural Solver for Multiple Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[100]  Soumith Chintala,et al.  A MultiPath Network for Object Detection , 2016, BMVC.

[101]  Larry S. Davis,et al.  R-FCN-3000 at 30fps: Decoupling Detection and Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[102]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[103]  Guoqiang Yu,et al.  muSSP: Efficient Min-cost Flow Algorithm for Multi-object Tracking , 2019, NeurIPS.

[104]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[105]  Kris M. Kitani,et al.  Rotational Rectification Network: Enabling Pedestrian Detection for Mobile Vision , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[106]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[107]  Xiangyu Zhang,et al.  Bounding Box Regression With Uncertainty for Accurate Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[108]  Ajmal Mian,et al.  Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking , 2020, ECCV.

[109]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[110]  Daniel Cremers,et al.  CVPR19 Tracking and Detection Challenge: How crowded can it get? , 2019, ArXiv.

[111]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[112]  Vladlen Koltun,et al.  Tracking Objects as Points , 2020, ECCV.