Hierarchical spatial-aware Siamese network for thermal infrared object tracking

Abstract Most thermal infrared (TIR) tracking methods are discriminative, treating the tracking problem as a classification task. However, the objective of the classifier (label prediction) is not coupled to the objective of the tracker (location estimation). The classification task focuses on the between-class difference of the arbitrary objects, while the tracking task mainly deals with the within-class difference of the same objects. In this paper, we cast the TIR tracking problem as a similarity verification task, which is coupled well to the objective of the tracking task. We propose a TIR tracker via a Hierarchical Spatial-aware Siamese Convolutional Neural Network (CNN), named HSSNet. To obtain both spatial and semantic features of the TIR object, we design a Siamese CNN that coalesces the multiple hierarchical convolutional layers. Then, we propose a spatial-aware network to enhance the discriminative ability of the coalesced hierarchical feature. Subsequently, we train this network end to end on a large visible video detection dataset to learn the similarity between paired objects before we transfer the network into the TIR domain. Next, this pre-trained Siamese network is used to evaluate the similarity between the target template and target candidates. Finally, we locate the candidate that is most similar to the tracked target. Extensive experimental results on the benchmarks VOT-TIR 2015 and VOT-TIR 2016 show that our proposed method achieves favorable performance compared to the state-of-the-art methods.

[1]  Min Li,et al.  Infrared Target Tracking Based on Robust Low-Rank Sparse Learning , 2016, IEEE Geoscience and Remote Sensing Letters.

[2]  Ales Leonardis,et al.  Is my new tracker really better than yours? , 2014, IEEE Winter Conference on Applications of Computer Vision.

[3]  Zhenyu He,et al.  Robust Object Tracking via Key Patch Sparse Representation , 2017, IEEE Transactions on Cybernetics.

[4]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[5]  Zhenyu He,et al.  The Thermal Infrared Visual Object Tracking VOT-TIR2016 Challenge Results , 2016, ECCV Workshops.

[6]  Seong Tae Jhang,et al.  Infrared Target Tracking Using Multi-Feature Joint Sparse Representation , 2016, RACS.

[7]  Xuelong Li,et al.  Robust Visual Tracking Using Structurally Random Projection and Weighted Least Squares , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Yanjing Sun,et al.  Multi-layer convolutional network-based visual tracking via important region selection , 2018, Neurocomputing.

[9]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[10]  Xinge You,et al.  An adaptive hybrid pattern for noise-robust texture analysis , 2015, Pattern Recognit..

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Jian Huang,et al.  Two-step single parameter regularization fisher discriminant method for face recognition , 2006, Int. J. Pattern Recognit. Artif. Intell..

[13]  Xiao-Yuan Jing,et al.  Super-resolution Person re-identification with semi-coupled low-rank discriminant dictionary learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  A. Aydin Alatan,et al.  Evaluation of Feature Channels for Correlation-Filter-Based Visual Object Tracking in Infrared Spectrum , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Zhenyu He,et al.  Connected Component Model for Multi-Object Tracking , 2016, IEEE Transactions on Image Processing.

[16]  Thomas Mauthner,et al.  In defense of color-based model-free tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jiri Matas,et al.  A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Zhenyu He,et al.  A multi-view model for visual tracking via correlation filters , 2016, Knowl. Based Syst..

[19]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[21]  Matej Kristan,et al.  Deformable Parts Correlation Filters for Robust Visual Tracking , 2016, IEEE Transactions on Cybernetics.

[22]  Pong C. Yuen,et al.  Robust Visual Tracking via Basis Matching , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Weihua Ou,et al.  Multi-view non-negative matrix factorization by patch alignment framework with view consistency , 2016, Neurocomputing.

[24]  Shunli Zhang,et al.  Single Object Tracking With Fuzzy Least Squares Support Vector Machine , 2015, IEEE Transactions on Image Processing.

[25]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[26]  Michael Felsberg,et al.  Channel Coded Distribution Field Tracking for Thermal Infrared Imagery , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Xinge You,et al.  Robust face recognition via occlusion dictionary learning , 2014, Pattern Recognit..

[29]  Jian Yang,et al.  Adaptive weighted nonnegative low-rank representation , 2018, Pattern Recognit..

[30]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[32]  Yujie He,et al.  Infrared target tracking via weighted correlation filter , 2015 .

[33]  Ming Tang,et al.  Multi-kernel Correlation Filter for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[35]  Zhenhua Guo,et al.  Robust Texture Image Representation by Scale Selective Local Binary Patterns , 2016, IEEE Transactions on Image Processing.

[36]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[37]  Guna Seetharaman,et al.  Persistent target tracking using likelihood fusion in wide-area and full motion video sequences , 2012, 2012 15th International Conference on Information Fusion.

[38]  Hong Zhu,et al.  Object tracking via dual fuzzy low-rank approximation , 2019, Int. J. Wavelets Multiresolution Inf. Process..

[39]  Xiaochun Cao,et al.  Fusing two-stream convolutional neural networks for RGB-T object tracking , 2017, Neurocomputing.

[40]  Zhenyu He,et al.  Joint sparse principal component analysis , 2017, Pattern Recognit..

[41]  Qifeng Yu,et al.  Dense structural learning for infrared object tracking at 200+ Frames per Second , 2017, Pattern Recognit. Lett..

[42]  Zhenhua Guo,et al.  Two-Dimensional Whitening Reconstruction for Enhancing Robustness of Principal Component Analysis , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jian Yang,et al.  Multilinear Sparse Principal Component Analysis , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Zuoyong Li,et al.  Inter-class sparsity based discriminative least square regression , 2018, Neural Networks.

[46]  Xinge You,et al.  Dynamically Modulated Mask Sparse Tracking , 2017, IEEE Transactions on Cybernetics.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Qi Tian,et al.  Geometric Hypergraph Learning for Visual Tracking , 2016, IEEE Transactions on Cybernetics.

[49]  Lunke Fei,et al.  Robust Sparse Linear Discriminant Analysis , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Zhenyu He,et al.  Unified Sparse Subspace Learning via Self-Contained Regression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Robert Laganière,et al.  Scalable Kernel Correlation Filter with Sparse Feature Integration , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[54]  Fei Wang,et al.  High Performance Visual Tracking with Circular and Structural Operators , 2018, Knowl. Based Syst..

[55]  Shunli Zhang,et al.  Robust Visual Tracking via Sparsity-Induced Subspace Learning , 2015, IEEE Transactions on Image Processing.

[56]  Genshe Chen,et al.  Infrared target tracking using multiple instance learning with adaptive motion prediction and spatially template weighting , 2013 .

[57]  Wenbing Tao,et al.  Once for All: A Two-Flow Convolutional Neural Network for Visual Tracking , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[58]  Xinge You,et al.  Local Metric Learning for Exemplar-Based Object Detection , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[59]  Ruiming Liu,et al.  Infrared target tracking in multiple feature pseudo-color image with kernel density estimation , 2012 .

[60]  Liang Xiao,et al.  Structure-Based Low-Rank Model With Graph Nuclear Norm Regularization for Noise Removal , 2017, IEEE Transactions on Image Processing.

[61]  Xiao Ma,et al.  Visual object tracking via coefficients constrained exclusive group LASSO , 2018, Machine Vision and Applications.

[62]  Ying Li,et al.  Real-time infrared target tracking based on ℓ1 minimization and compressive features. , 2014, Applied optics.

[63]  Jian Yang,et al.  Approximate Orthogonal Sparse Embedding for Dimensionality Reduction , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[64]  Wolfgang Hübner,et al.  MAD for visual tracker fusion , 2016, Security + Defence.

[65]  Thomas B. Moeslund,et al.  Thermal cameras and applications: a survey , 2013, Machine Vision and Applications.

[66]  Zhenyu He,et al.  Deep convolutional neural networks for thermal infrared object tracking , 2017, Knowl. Based Syst..

[67]  Xin Yu,et al.  Hybrid support vector machines for robust object tracking , 2015, Pattern Recognit..

[68]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[69]  Huseyin Ozkan,et al.  Comparison of infrared and visible imagery for object tracking: Toward trackers with superior IR performance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[70]  Bin Yan,et al.  Insulator detection and recognition of explosion based on convolutional neural networks , 2019, Int. J. Wavelets Multiresolution Inf. Process..

[71]  C S Asha,et al.  Robust infrared target tracking using discriminative and generative approaches , 2017 .

[72]  Xiao Ma,et al.  Visual object tracking with online sample selection via lasso regularization , 2017, Signal Image Video Process..

[73]  Fabio Del Frate,et al.  Review of Thermal Infrared Applications and Requirements for Future High-Resolution Sensors , 2016, IEEE Transactions on Geoscience and Remote Sensing.