Robust RGB-D tracking via compact CNN features

Abstract Feature representation is at the core of visual tracking. This paper presents a robust tracking method in RGB-D videos. Firstly, the RGB and depth images are separately encoded using a hierarchical convolutional neural network (CNN) features. Secondly, in order to reduce computation cost, we exploit random projection to compress the CNN features. The high dimensional CNN features are randomly projected into a low dimensional feature space. The correlation filter tracking framework is then independently carried out in RGB and depth images. And backward tracking scheme is adopted to evaluate the tracking results in these two images. The final position is determined according to the tracked location in the two image channels. In addition, model updating is implemented adaptively. Our tracker is evaluated on two RGB-D benchmark datasets and achieves comparable results to the other state-of-the-art RGB-D tracking methods.

[1]  V. Beran,et al.  Depth-Based Filtration for Tracking Boost , 2015, ACIVS.

[2]  Paul W. Fieguth,et al.  Fusing Sorted Random Projections for Robust Texture and Material Classification , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Michael Felsberg,et al.  Discriminative Scale Space Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yuqing Gao,et al.  Robust Fusion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio-Temporal Consistency Constraints , 2018, IEEE Transactions on Cybernetics.

[6]  Majid Mirmehdi,et al.  DS-KCF: a real-time tracker for RGB-D data , 2016, Journal of Real-Time Image Processing.

[7]  Shin Ishii,et al.  An occlusion-aware particle filter tracker to handle complex and persistent occlusions , 2016, Computer Vision and Image Understanding.

[8]  Massimo Piccardi,et al.  Local Depth Patterns for Tracking in Depth Videos , 2015, ACM Multimedia.

[9]  Armin B. Cremers,et al.  Adaptive Multi-cue 3D Tracking of Arbitrary Objects , 2012, DAGM/OAGM Symposium.

[10]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Rama Chellappa,et al.  Automatic head pose estimation using randomly projected dense SIFT descriptors , 2012, 2012 19th IEEE International Conference on Image Processing.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Chunhua Shen,et al.  Real-time visual tracking using compressive sensing , 2011, CVPR 2011.

[14]  Lu Ding,et al.  Detection based visual tracking with convolutional neural network , 2019, Knowl. Based Syst..

[15]  Paul W. Fieguth,et al.  Texture Classification from Random Features , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Lei Zhang,et al.  Fast Compressive Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Qi Wang,et al.  Multi-cue based tracking , 2014, Neurocomputing.

[18]  Chao Deng,et al.  Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking , 2019, Inf. Fusion.

[19]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[20]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.