A neural network approach to visual tracking

Fully Convolution Networks (FCNs) have been shown to be effective in semantic segmentation through fine-tuning classification networks on segmentation data. In this paper, we present that FCNs can be further fine-tuned on target-background images in order to solve visual tracking problems. Pixel level models (FCNs) trained on segmentation data are superior to class level models (e.g. VGG net and GoogLeNet) in visual tracking tasks due to their powerful ability in discriminating between objects and background. Our work is based on a FCN network structure. The result is achieved by first fine-tuning the first image of a sequence and then the tracking and updating processes are conducted through classical forward and backward processes of neural networks. The proposed model achieves high precision and tracking success rates in online object tracking benchmark (OTB) data. It indicates our approach is competitive to state-of-the-art approaches as well.

[1]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[3]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[4]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yang Li,et al.  Reliable Patch Trackers: Robust visual tracking by exploiting reliable patches , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Kin Hong Wong,et al.  Pyramid-Based Visual Tracking Using Sparsity Represented Mean Transform , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..