Multi-target tracking of surveillance video with differential YOLO and DeepSort

With the shortcomings of traditional algorithm in video surveillance on low accuracy, poor robustness and unable achieved real-time tracking for multi-targets, this paper presents a Multi-target tracking algorithm, DeepSort, on the base of deep neural network to achieve the end-to-end surveillance video multi-personal target real-time detection and tracking. The high accuracy of target detection by YOLO algorithm provides DeepSort with weaker dependence on detection results, lower interference of occlusion and illumination and improved tracking robustness. Moreover, due to the high redundancy of the surveillance video itself, the difference filter is used to screen the video frames with no foreground targets and small changes, so as to reduce the detection cost and improve the detection and tracking speed. The experimental evaluation of the video surveillance dataset NPLR, the average MOTA of this algorithm is 68.7, the highest value is 86.8; the average speed is 81.6Hz, the highest value is 140Hz. It shows that the end-to-end algorithm is feasible and effective.

[1]  Kenneth Y. Goldberg,et al.  Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation , 2012, 2012 American Control Conference (ACC).

[2]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[3]  Yaakov Bar-Shalom,et al.  Sonar tracking of multiple targets using joint probabilistic data association , 1983 .

[4]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[5]  Ferdinand van der Heijden,et al.  Efficient adaptive density estimation per image pixel for the task of background subtraction , 2006, Pattern Recognit. Lett..

[6]  Hassan Foroosh,et al.  3D Pose Tracking With Multitemplate Warping and SIFT Correspondences , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Nigel J. B. McFarlane,et al.  Segmentation and tracking of piglets in images , 1995, Machine Vision and Applications.

[8]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[13]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[14]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[15]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[16]  Kaiqi Huang,et al.  An Equalized Global Graph Model-Based Approach for Multicamera Object Tracking , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  H. Kuhn The Hungarian method for the assignment problem , 1955 .