A Multiple Object Tracking Algorithm Based on YOLO Detection

In order to further improve the accuracy and the efficiency of multi-target tracking, a multi-target tracking algorithm based on YOLO is proposed. Firstly, the video stream is detected by YOLO algorithm for multi-target detection. After obtaining the target size, position and other information, the depth feature extraction is performed, the noise data of the unrelated regions in the image is removed, and the complexity of calculation and time of feature extraction are reduced. LSTM (long short-term memory) obtains the temporal relationship between frames and frames. Finally, the Euclidean distance is used to measure the similarity so as to achieve target matching and association and complete the tracking of multiple targets in the video stream. Experiments on the open target tracking data set MOT-16 and MSR Data Set show that the proposed algorithm is workable on multi-target tracking.

[1]  Olumayowa A. Idowu,et al.  Development and Performance Evaluation of Hausdorff Distance Algorithm Based Facial Recognition System , 2018 .

[2]  Tony Lindeberg,et al.  Scale Invariant Feature Transform , 2012, Scholarpedia.

[3]  Bonhwa Ku,et al.  Online multi-object tracking with efficient track drift and fragmentation handling. , 2017, Journal of the Optical Society of America. A, Optics, image science, and vision.

[4]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Hasan Sajid,et al.  Universal Multimode Background Subtraction , 2017, IEEE Transactions on Image Processing.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Sandeep Singh Sengar,et al.  Moving object detection based on frame difference and W4 , 2017, Signal Image Video Process..

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Francesc Moreno-Noguer,et al.  3D CNNs on Distance Matrices for Human Action Recognition , 2017, ACM Multimedia.

[14]  Yan Wang,et al.  基于特征距离加权的手势识别 (Gesture Recognition Based on Weighted Feature Distance) , 2017, 计算机科学.

[15]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[16]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[18]  Michael J. Black,et al.  Optical Flow in Mostly Rigid Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  J. Rosenthal,et al.  Markov Chain Monte Carlo , 2018 .