Correlation filter-based visual tracking via adaptive weighted CNN features fusion

Visual object tracking is an important and challenging task in computer vision. In this study, the authors propose a novel visual tracking approach by decomposing the tracking task into translation and scale estimation. In translation estimation, they employ multiple adaptive correlation filters with features of hierarchical convolutional neural networks (CNNs) to more accurately estimate the target location. To make full use of multi-level features from different CNN layers, they propose an adaptive weighted algorithm to fuse correlation response maps. In scale estimation, a one-dimensional correlation filter with histogram of oriented gradient (HOG) features is employed to estimate the scale variation. Extensive experimental results on 50 challenging benchmark video sequences demonstrate that the proposed algorithm outperforms state-of-the-art algorithms.