Fully Convolutional Siamese Fusion Networks for Object Tracking

In this paper, we propose fully convolutional siamese fusion networks for object tracking. We adopt the fusion strategy of convolutional layers for object tracking to achieve good feature representation based on convolutional neural networks. Specifically, we fuse three convolutional layers of VGGNet based on normalized cross correlation (NCC). First, we use three convolutional layers of VGGNet as the basis for fusion, and reduce the size of the convolutional layers based on a convolution kernel. Then, we resize the convolutional layers to be the same size as the deconvolutional layers for layer fusion. Next, we fuse the three layers based on NCC between the target and search region, and produce the response map. Finally, we get the tracking result from the response map by the maximum response. Various experiments on large-scale data sets verify that the proposed method is robust to occlusion, deformation, motion blur, and background clutter as well as outperforms state-of-the-art trackers in terms of distance precision and overlap success.

[1]  Shai Avidan,et al.  Locally Orderless Tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[3]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Bin Shen,et al.  Online robust image alignment via iterative convex optimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Narendra Ahuja,et al.  Robust visual tracking via multi-task sparse learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[11]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[13]  David Zhang,et al.  Fast Visual Tracking via Dense Spatio-temporal Context Learning , 2014, ECCV.

[14]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Huchuan Lu,et al.  Robust object tracking via sparsity-based collaborative model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.