Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets

State-of-the-art multi-object tracking (MOT) methods follow the tracking-by-detection paradigm, where object trajectories are obtained by associating per-frame outputs of object detectors. In crowded scenes, however, detectors often fail to obtain accurate detections due to heavy occlusions and high crowd density. In this paper, we propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes. Using crowd density maps, we jointly model detection, counting, and tracking of multiple targets as a network flow program, which simultaneously finds the global optimal detections and trajectories of multiple targets over the whole video. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to errors in crowded scenes, or rely on a suboptimal two-step process using heuristic density-aware point-tracks for matching targets. Our approach yields promising results on public benchmarks of various domains including people tracking, cell tracking, and fish tracking.

[1]  Charless C. Fowlkes,et al.  Learning Optimal Parameters For Multi-target Tracking , 2015, BMVC.

[2]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[3]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Pascal Fua,et al.  What Players do with the Ball: A Physically Constrained Interaction Modeling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xinggang Wang,et al.  A Simple Baseline for Multi-Object Tracking , 2020, ArXiv.

[6]  Nuno Vasconcelos,et al.  Counting People With Low-Level Features and Bayesian Regression , 2012, IEEE Transactions on Image Processing.

[7]  James M. Rehg,et al.  Multi-object Tracking with Neural Gating Using Bilinear LSTM , 2018, ECCV.

[8]  Wei Wu,et al.  Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology , 2018, ACM Multimedia.

[9]  S. Shankar Sastry,et al.  Markov Chain Monte Carlo Data Association for Multi-Target Tracking , 2009, IEEE Transactions on Automatic Control.

[10]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jean-Luc Dugelay,et al.  Spatio-temporal crowd density model in a human detection and tracking framework , 2015, Signal Process. Image Commun..

[12]  Bernt Schiele,et al.  Multi-person Tracking by Multicut and Deep Matching , 2016, ECCV Workshops.

[13]  Dariu Gavrila,et al.  A Bayesian Framework for Multi-cue 3D Object Tracking , 2004, ECCV.

[14]  Ameya Prabhu,et al.  Simple Unsupervised Multi-Object Tracking , 2020, ArXiv.

[15]  Xianming Liu,et al.  Greedy Batch-Based Minimum-Cost Flows for Tracking Multiple Objects , 2017, IEEE Transactions on Image Processing.

[16]  Pascal Fua,et al.  Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Nathalie Harder,et al.  A benchmark for comparison of cell tracking algorithms , 2014, Bioinform..

[18]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Ullrich Köthe,et al.  Learning to count with regression forest and structured labels , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[20]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[21]  Wenhan Luo,et al.  Generic Object Crowd Tracking by Multi-Task Learning , 2013, BMVC.

[22]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[23]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[24]  Pascal Fua,et al.  Globally Consistent Multi-People Tracking using Motion Patterns , 2016, ArXiv.

[25]  Deyu Meng,et al.  The Solution Path Algorithm for Identity-Aware Multi-object Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zunlei Feng,et al.  Factorizable Graph Convolutional Networks , 2020, NeurIPS.

[27]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Dacheng Tao,et al.  Distilling Knowledge From Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Afshin Dehghan,et al.  Binary Quadratic Programing for Online Tracking of Hundreds of People in Extremely Crowded Scenes , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[31]  Pascal Fua,et al.  Tracking Interacting Objects Using Intertwined Flows , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Gérard G. Medioni,et al.  Tracking Using Motion Patterns for Very Crowded Scenes , 2012, ECCV.

[33]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Thomas Brox,et al.  Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[36]  Junjie Yan,et al.  Multiple Target Tracking Based on Undirected Hierarchical Relation Hypergraph , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[39]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[40]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[41]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Dit-Yan Yeung,et al.  Spatiotemporal Modeling for Crowd Counting in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[44]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[47]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[49]  Wenhan Luo,et al.  Multiple object tracking: A literature review , 2014, Artif. Intell..

[50]  Pascal Fua,et al.  Non-Markovian Globally Consistent Multi-object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Wen Gao,et al.  Interacting Tracklets for Multi-Object Tracking , 2018, IEEE Transactions on Image Processing.

[52]  Ming-Hsuan Yang,et al.  Exploiting Hierarchical Dense Structures on Hypergraphs for Multi-Object Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Ramakant Nevatia,et al.  An online learned CRF model for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Wei Wang,et al.  A multiple object tracking method using Kalman filter , 2010, The 2010 IEEE International Conference on Information and Automation.

[55]  James J. Little,et al.  A Linear Programming Approach for Multiple Object Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[57]  Ko Nishino,et al.  Tracking Pedestrians Using Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Nathalie Harder,et al.  An Objective Comparison of Cell Tracking Algorithms , 2017, Nature Methods.

[59]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[60]  James Black,et al.  Multi view image surveillance and tracking , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[61]  Jonathon A. Chambers,et al.  Multi-Level Cooperative Fusion of GM-PHD Filters for Online Multiple Human Tracking , 2019, IEEE Transactions on Multimedia.

[62]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[63]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Carlo Tomasi,et al.  Features for Multi-target Multi-camera Tracking and Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[67]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[68]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[69]  Ivan Laptev,et al.  Density-aware person detection and tracking in crowds , 2011, ICCV.

[70]  Gang Wang,et al.  Joint Learning of Convolutional Neural Networks and Temporally Constrained Metrics for Tracklet Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[71]  Simon Lucey,et al.  Learning Policies for Adaptive Tracking with Deep Feature Cascades , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[72]  Greg J. Stephens,et al.  Towards Dense Object Tracking in a 2D Honeybee Hive , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73]  Long Lan,et al.  Semi-online Multi-people Tracking by Re-identification , 2020, International Journal of Computer Vision.

[74]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[75]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Bernt Schiele,et al.  Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[78]  Konrad Schindler,et al.  Multi-target tracking by continuous energy minimization , 2011, CVPR 2011.

[79]  Dacheng Tao,et al.  Subspaces Indexing Model on Grassmann Manifold for Image Search , 2011, IEEE Transactions on Image Processing.

[80]  Pascal Fua,et al.  Multi-Commodity Network Flow for Tracking Multiple People , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Dinesh Manocha,et al.  AdaPT: Real-time adaptive pedestrian tracking for crowded scenes , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[82]  Antoni B. Chan,et al.  Incorporating Side Information by Adaptive Convolution , 2017, International Journal of Computer Vision.

[83]  Charless C. Fowlkes,et al.  Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions , 2016, International Journal of Computer Vision.

[84]  Pascal Fua,et al.  Tracking Interacting Objects Optimally Using Integer Programming , 2014, ECCV.

[85]  Antoni B. Chan,et al.  Small instance detection by integer programming on object density maps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Bohyung Han,et al.  Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Afshin Dehghan,et al.  Target Identity-aware Network Flow for online multiple target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[89]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Bodo Rosenhahn,et al.  Multiple People Tracking Using Body and Joint Detections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[91]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Robert T. Collins,et al.  Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  José M. F. Moura,et al.  FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[94]  Nenghai Yu,et al.  Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[95]  Ivan Laptev,et al.  On pairwise costs for network flow multi-object tracking , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).