Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets

State-of-the-art multi-object tracking (MOT) methods follow the tracking-by-detection paradigm, where object trajectories are obtained by associating per-frame outputs of object detectors. In crowded scenes, however, detectors often fail to obtain accurate detections due to heavy occlusions and high crowd density. In this paper, we propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes. Using crowd density maps, we jointly model detection, counting, and tracking of multiple targets as a network flow program, which simultaneously finds the global optimal detections and trajectories of multiple targets over the whole video. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to errors in crowded scenes, or rely on a suboptimal two-step process using heuristic density-aware point-tracks for matching targets. Our approach yields promising results on public benchmarks of various domains including people tracking, cell tracking, and fish tracking.

[1]  Wei Wu,et al.  Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology , 2018, ACM Multimedia.

[2]  Ramakant Nevatia,et al.  An online learned CRF model for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Greg J. Stephens,et al.  Towards Dense Object Tracking in a 2D Honeybee Hive , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[5]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[6]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Antoni B. Chan,et al.  Small instance detection by integer programming on object density maps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xinggang Wang,et al.  A Simple Baseline for Multi-Object Tracking , 2020, ArXiv.

[9]  Robert T. Collins,et al.  Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Nenghai Yu,et al.  Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Dariu Gavrila,et al.  A Bayesian Framework for Multi-cue 3D Object Tracking , 2004, ECCV.

[12]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Konrad Schindler,et al.  Multi-target tracking by continuous energy minimization , 2011, CVPR 2011.

[14]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[16]  Dinesh Manocha,et al.  AdaPT: Real-time adaptive pedestrian tracking for crowded scenes , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Antoni B. Chan,et al.  Incorporating Side Information by Adaptive Convolution , 2017, International Journal of Computer Vision.

[18]  Nuno Vasconcelos,et al.  Counting People With Low-Level Features and Bayesian Regression , 2012, IEEE Transactions on Image Processing.

[19]  S. Shankar Sastry,et al.  Markov Chain Monte Carlo Data Association for Multi-Target Tracking , 2009, IEEE Transactions on Automatic Control.

[20]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[21]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Simon Lucey,et al.  Learning Policies for Adaptive Tracking with Deep Feature Cascades , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Pascal Fua,et al.  Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  Ullrich Köthe,et al.  Learning to count with regression forest and structured labels , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[27]  Dit-Yan Yeung,et al.  Spatiotemporal Modeling for Crowd Counting in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[29]  Ming-Hsuan Yang,et al.  Exploiting Hierarchical Dense Structures on Hypergraphs for Multi-Object Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Dacheng Tao,et al.  Subspaces Indexing Model on Grassmann Manifold for Image Search , 2011, IEEE Transactions on Image Processing.

[31]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[33]  Ivan Laptev,et al.  Density-aware person detection and tracking in crowds , 2011, ICCV.

[34]  Bodo Rosenhahn,et al.  Multiple People Tracking Using Body and Joint Detections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Ameya Prabhu,et al.  Simple Unsupervised Multi-Object Tracking , 2020, ArXiv.

[36]  Wenhan Luo,et al.  Multiple object tracking: A literature review , 2014, Artif. Intell..

[37]  Bohyung Han,et al.  Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Pascal Fua,et al.  Non-Markovian Globally Consistent Multi-object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Jean-Luc Dugelay,et al.  Spatio-temporal crowd density model in a human detection and tracking framework , 2015, Signal Process. Image Commun..

[40]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Jonathon A. Chambers,et al.  Multi-Level Cooperative Fusion of GM-PHD Filters for Online Multiple Human Tracking , 2019, IEEE Transactions on Multimedia.

[42]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[43]  Pascal Fua,et al.  What Players do with the Ball: A Physically Constrained Interaction Modeling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[45]  Ko Nishino,et al.  Tracking Pedestrians Using Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Pascal Fua,et al.  Multi-Commodity Network Flow for Tracking Multiple People , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[49]  Afshin Dehghan,et al.  Binary Quadratic Programing for Online Tracking of Hundreds of People in Extremely Crowded Scenes , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  José M. F. Moura,et al.  FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Pascal Fua,et al.  Tracking Interacting Objects Optimally Using Integer Programming , 2014, ECCV.

[52]  D. Tao,et al.  Distilling Knowledge From Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Afshin Dehghan,et al.  Target Identity-aware Network Flow for online multiple target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Nathalie Harder,et al.  An Objective Comparison of Cell Tracking Algorithms , 2017, Nature Methods.

[55]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[56]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[57]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[58]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[59]  James Black,et al.  Multi view image surveillance and tracking , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[60]  Thomas Brox,et al.  Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Gérard G. Medioni,et al.  Tracking Using Motion Patterns for Very Crowded Scenes , 2012, ECCV.

[62]  Charless C. Fowlkes,et al.  Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions , 2016, International Journal of Computer Vision.

[63]  Pascal Fua,et al.  Globally Consistent Multi-People Tracking using Motion Patterns , 2016, ArXiv.

[64]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[65]  Pascal Fua,et al.  Tracking Interacting Objects Using Intertwined Flows , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Nathalie Harder,et al.  A benchmark for comparison of cell tracking algorithms , 2014, Bioinform..

[67]  James M. Rehg,et al.  Multi-object Tracking with Neural Gating Using Bilinear LSTM , 2018, ECCV.

[68]  Bernt Schiele,et al.  Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[69]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Deyu Meng,et al.  The Solution Path Algorithm for Identity-Aware Multi-object Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[72]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[73]  Wei Wang,et al.  A multiple object tracking method using Kalman filter , 2010, The 2010 IEEE International Conference on Information and Automation.

[74]  Charless C. Fowlkes,et al.  Learning Optimal Parameters For Multi-target Tracking , 2015, BMVC.

[75]  Bernt Schiele,et al.  Multi-person Tracking by Multicut and Deep Matching , 2016, ECCV Workshops.

[76]  Carlo Tomasi,et al.  Features for Multi-target Multi-camera Tracking and Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Wenhan Luo,et al.  Generic Object Crowd Tracking by Multi-Task Learning , 2013, BMVC.

[78]  Long Lan,et al.  Semi-online Multi-people Tracking by Re-identification , 2020, International Journal of Computer Vision.

[79]  Wen Gao,et al.  Interacting Tracklets for Multi-Object Tracking , 2018, IEEE Transactions on Image Processing.

[80]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[81]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Junjie Yan,et al.  Multiple Target Tracking Based on Undirected Hierarchical Relation Hypergraph , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[88]  Ivan Laptev,et al.  On pairwise costs for network flow multi-object tracking , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Xianming Liu,et al.  Greedy Batch-Based Minimum-Cost Flows for Tracking Multiple Objects , 2017, IEEE Transactions on Image Processing.

[90]  Gang Wang,et al.  Joint Learning of Convolutional Neural Networks and Temporally Constrained Metrics for Tracklet Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[91]  James J. Little,et al.  A Linear Programming Approach for Multiple Object Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[92]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[93]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[94]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[95]  Zunlei Feng,et al.  Factorizable Graph Convolutional Networks , 2020, NeurIPS.