Real-Time Online Multi-Object Tracking in Compressed Domain

Recent online multi-object tracking (MOT) methods have achieved desirable tracking performance. However, the tracking speed of most existing methods is rather slow. Inspired from the fact that the adjacent frames are highly relevant and redundant, we divide the frames into key and non-key frames and track objects in the compressed domain. For the key frames, the RGB images are restored for detection and data association. To make data association more reliable, an appearance convolutional neural network (CNN) which can be jointly trained with the detector is proposed. For the non-key frames, the objects are directly propagated by a tracking CNN based on the motion information provided in the compressed domain. Compared with the state-of-the-art online MOT methods, our tracker is about $6\times $ faster while maintaining a comparable tracking performance.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Rainer Stiefelhagen,et al.  The CLEAR 2006 Evaluation , 2006, CLEAR.

[3]  Jonathon A. Chambers,et al.  Multi-Level Cooperative Fusion of GM-PHD Filters for Online Multiple Human Tracking , 2019, IEEE Transactions on Multimedia.

[4]  Zhidong Deng,et al.  Fast Object Detection in Compressed Video , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Alexander J. Smola,et al.  Compressed Video Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Hilke Kieritz,et al.  Online multi-person tracking using Integral Channel Features , 2016, 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[8]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[9]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[12]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[13]  Ram Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, CVPR.

[14]  Nenghai Yu,et al.  Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Seung-Hwan Bae,et al.  Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Gang Wang,et al.  Tracklet Association with Online Target-Specific Metric Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Volker Eiselein,et al.  Sequential sensor fusion combining probability hypothesis density and kernelized correlation filters for multi-object tracking in video data , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[19]  Kwangjin Yoon,et al.  Online Multi-Object Tracking with Historical Appearance Matching and Scene Adaptive Detection Filtering , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[20]  Ramakant Nevatia,et al.  An online learned CRF model for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Yu Liu,et al.  POI: Multiple Object Tracking with High Performance Detection and Appearance Feature , 2016, ECCV Workshops.

[22]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[23]  Takashi Sato,et al.  Interpolation-Based Object Detection Using Motion Vectors for Embedded Real-time Tracking Systems , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Seung-Hwan Bae,et al.  Learning Discriminative Appearance Models for Online Multi-Object Tracking With Appearance Discriminability Measures , 2018, IEEE Access.

[25]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[26]  Fabio Poiesi,et al.  Online Multi-target Tracking with Strong and Weak Detections , 2016, ECCV Workshops.

[27]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[28]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Ramakant Nevatia,et al.  Multi-target tracking by on-line learned discriminative appearance models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[31]  Euntai Kim,et al.  Multiple Object Tracking via Feature Pyramid Siamese Networks , 2019, IEEE Access.

[32]  Ivan V. Bajic,et al.  MV-YOLO: Motion Vector-Aided Tracking by Semantic Object Detection , 2018, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).

[33]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Joint Video Team Draft ITU-T Recommendation and Final draft international standard of joint video specification , 2003 .

[36]  Volker Eiselein,et al.  Real-Time Multi-human Tracking Using a Probability Hypothesis Density Filter and Multiple Detectors , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[37]  Zeyu Fu,et al.  Particle PHD Filter Based Multiple Human Tracking Using Online Group-Structured Dictionary Learning , 2018, IEEE Access.