Learning a Neural Solver for Multiple Object Tracking

Graphs offer a natural way to formulate Multiple Object Tracking (MOT) within the tracking-by-detection paradigm. However, they also introduce a major challenge for learning methods, as defining a model that can operate on such structured domain is not trivial. As a consequence, most learning-based work has been devoted to learning better features for MOT and then using these with well-established optimization frameworks. In this work, we exploit the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks (MPNs). By operating directly on the graph domain, our method can reason globally over an entire set of detections and predict final solutions. Hence, we show that learning in MOT does not need to be restricted to feature extraction, but it can also be applied to the data association step. We show a significant improvement in both MOTA and IDF1 on three publicly available benchmarks. Our code is available at https://bit.ly/motsolv.

[1]  Li Fei-Fei,et al.  Neural Graph Matching Networks for Fewshot 3D Action Recognition , 2018, ECCV.

[2]  Long Chen,et al.  Aggregate Tracklet Appearance Features for Multi-Object Tracking , 2019, IEEE Signal Processing Letters.

[3]  Fan Yang,et al.  Trajectory Factory: Tracklet Cleaving and Re-Connection by Deep Siamese Bi-GRU for Multiple Object Tracking , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[4]  Yang Zhang,et al.  Heterogeneous Association Graph Fusion for Target Association in Multiple Object Tracking , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[7]  Bodo Rosenhahn,et al.  Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[8]  Svetlana Lazebnik,et al.  Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering , 2018, NeurIPS.

[9]  Thomas Brox,et al.  Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  S. Savarese,et al.  Learning an Image-Based Motion Context for Multiple People Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Bodo Rosenhahn,et al.  Multiple People Tracking Using Body and Joint Detections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[15]  Bodo Rosenhahn,et al.  Branch-and-price global optimization for multi-view multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[17]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Raquel Urtasun,et al.  End-to-end Learning of Multi-sensor 3D Tracking by Detection , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[20]  Haibin Ling,et al.  FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Bodo Rosenhahn,et al.  Improvements to Frank-Wolfe optimization for multi-detector multi-object tracking , 2017, ArXiv.

[22]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Kwangjin Yoon,et al.  Online Multi-Object Tracking with Historical Appearance Matching and Scene Adaptive Detection Filtering , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[24]  Long Chen,et al.  Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Tianzhu Zhang,et al.  Graph Convolutional Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Gérard G. Medioni,et al.  Multiple Target Tracking Using Spatio-Temporal Markov Chain Monte Carlo Data Association , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Ian D. Reid,et al.  Joint tracking and segmentation of multiple targets , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Luc Van Gool,et al.  Customized Multi-person Tracker , 2018, ACCV.

[31]  James M. Rehg,et al.  Multi-object Tracking with Neural Gating Using Bilinear LSTM , 2018, ECCV.

[32]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[34]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[35]  Wongun Choi,et al.  Deep Network Flow for Multi-object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Charless C. Fowlkes,et al.  Learning Optimal Parameters For Multi-target Tracking , 2015, BMVC.

[37]  Pascal Fua,et al.  Robust People Tracking with Global Trajectory Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[39]  Haibin Ling,et al.  Online Multi-Object Tracking With Instance-Aware Tracker and Dynamic Model Refreshment , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Luc Van Gool,et al.  Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[42]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[44]  Zhuwen Li,et al.  Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search , 2018, NeurIPS.

[45]  Silvio Savarese,et al.  A Unified Framework for Multi-target Tracking and Collective Activity Recognition , 2012, ECCV.

[46]  Jan Feyereisl,et al.  Online Multi-target Tracking by Large Margin Structured Learning , 2012, ACCV.

[47]  Margrit Betke,et al.  Efficient track linking methods for track graphs using network-flow and set-cover techniques , 2011, CVPR 2011.

[48]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[49]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[50]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[51]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Yang Zhang,et al.  Iterative Multiple Hypothesis Tracking With Tracklet-Level Association , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[53]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[54]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[55]  Dietrich Paulus,et al.  Global data association for the Probability Hypothesis Density filter using network flows , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yue Cao,et al.  Spatial-Temporal Relation Networks for Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Long Chen,et al.  Online multi-object tracking with convolutional neural networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[60]  James J. Little,et al.  A Linear Programming Approach for Multiple Object Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Bohyung Han,et al.  Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[63]  Bodo Rosenhahn,et al.  Tracking with multi-level features , 2016, ArXiv.

[64]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[66]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[67]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[69]  Carlo Tomasi,et al.  Features for Multi-target Multi-camera Tracking and Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[71]  Bernt Schiele,et al.  Subgraph decomposition for multi-target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[74]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[76]  Bernt Schiele,et al.  Multi-person Tracking by Multicut and Deep Matching , 2016, ECCV Workshops.