Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification

Online multi-object tracking is a fundamental problem in time-critical video analysis applications. A major challenge in the popular tracking-by-detection framework is how to associate unreliable detection results with existing tracks. In this paper, we propose to handle unreliable detection by collecting candidates from outputs of both detection and tracking. The intuition behind generating redundant candidates is that detection and tracks can complement each other in different scenarios. Detection results of high confidence prevent tracking drifts in the long term, and predictions of tracks can handle noisy detection caused by occlusion. In order to apply optimal selection from a considerable amount of candidates in real-time, we present a novel scoring function based on a fully convolutional neural network, that shares most computations on the entire image. Moreover, we adopt a deeply learned appearance representation, which is trained on large-scale person re-identification datasets, to improve the identification ability of our tracker. Extensive experiments show that our tracker achieves real-time and state-of-the-art performance on a widely used people tracking benchmark.

[1]  Yuqian Wu,et al.  Multi-person tracking-by-detection with local particle filtering and global occlusion handling , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[2]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Romaric Audigier,et al.  Improving Multi-frame Data Association with Sparse Representations for Robust Near-online Multi-object Tracking , 2016, ECCV.

[6]  Seung-Hwan Bae,et al.  Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ramakant Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[10]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[11]  Wu Liu,et al.  Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[12]  Jia Chen,et al.  Rewind to track: Parallelized apprenticeship learning with backward tracklets , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[13]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[17]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[18]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Riccardo Satta,et al.  Appearance Descriptors for Person Re-identification: a Comprehensive Review , 2013, ArXiv.

[20]  Nenghai Yu,et al.  Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Ioannis A. Kakadiaris,et al.  To Track or To Detect? An Ensemble Framework for Optimal Selection , 2012, ECCV.

[22]  Fabio Poiesi,et al.  Online Multi-target Tracking with Strong and Weak Detections , 2016, ECCV Workshops.

[23]  Bernt Schiele,et al.  Multi-person Tracking by Multicut and Deep Matching , 2016, ECCV Workshops.

[24]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).