Multiple-Model Fully Convolutional Neural Networks for Single Object Tracking on Thermal Infrared Video

The availability of affordable thermal infrared (TIR) camera has instigated its usage in various research fields, especially for the cases that require images to be captured in dark surroundings. One of the low-level tasks required by most TIR-based researches is the need to track an object throughout a video sequence. The main challenge posed by TIR camera usage is the lack of texture to differentiate two nearby objects of the same class. According to the VOT-TIR 2016 challenge, the best fully convolutional neural network (FCNN)-based tracker has only managed to obtain the third place. The discriminative ability of the FCNN tracker is not fully utilized because of the homogenous appearance pattern of the tracked object. This paper aims to improve FCNN-based tracker ability to predict object location through comprehensive sampling approach as well as better scoring scheme. Hence, a multiple-model FCNN is proposed, in which a small set of fully connected layers is updated on the top of pre-trained convolutional neural networks. The possible object locations are generated based on a two-stage sampling that combines stochastically distributed samples and clustered foreground contour information. The best sample is selected according to a combined score of appearance similarity, predicted location, and model reliability. The small set of appearance models is updated by using positive and negative training samples, accumulated from two periods of time which are the recent and parent node intervals. To further improve training accuracy, the samples are generated according to a set of adaptive variances that depends on the trustworthiness of the tracker output. The results show an improvement over TCNN, an FCNN-based tracker that won the VOT 2016 challenge with the expected average overlap increasing from 0.248 to 0.257. The performance enhancement is attributed to the better robustness with a 20% reduction in tracking failure rate compared to the TCNN.

[1]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[2]  Wolfgang Hübner,et al.  MAD for visual tracker fusion , 2016, Security + Defence.

[3]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Nenghai Yu,et al.  Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Aidong Men,et al.  Learning spatial-temporal consistent correlation filter for visual tracking , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[7]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[8]  Arcangelo Merla,et al.  Exploring the Use of Thermal Infrared Imaging in Human Stress Research , 2014, PloS one.

[9]  Anne E. Goodenough,et al.  Can Handheld Thermal Imaging Technology Improve Detection of Poachers in African Bushveldt? , 2015, PloS one.

[10]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[11]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ming Tang,et al.  Multi-kernel Correlation Filter for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Michael Felsberg,et al.  The Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge Results , 2015, ICCV Workshops.

[16]  Hongdong Li,et al.  Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[18]  Qiang Wang,et al.  Robust Object Tracking Based on Temporal and Spatial Deep Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[20]  Bohyung Han,et al.  BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Margrit Betke,et al.  Applications of thermal infrared imaging for research in aeroecology. , 2008, Integrative and comparative biology.

[22]  Wongun Choi,et al.  Deep Network Flow for Multi-object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Matej Kristan,et al.  Deformable Parts Correlation Filters for Robust Visual Tracking , 2016, IEEE Transactions on Cybernetics.

[24]  Xia Li,et al.  Kernalised Multi-resolution Convnet for Visual Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[26]  Qi Tian,et al.  Geometric Hypergraph Learning for Visual Tracking , 2016, IEEE Transactions on Cybernetics.

[27]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[28]  C. Corsi Infrared: A Key Technology for Security Systems , 2012 .

[29]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Zhenyu He,et al.  A multi-view model for visual tracking via correlation filters , 2016, Knowl. Based Syst..

[31]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bohyung Han,et al.  Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Bohyung Han,et al.  Modeling and Propagating CNNs in a Tree Structure for Visual Tracking , 2016, ArXiv.

[35]  Justyna Cilulko,et al.  Infrared thermal imaging in studies of wild animals , 2013, European Journal of Wildlife Research.

[36]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[37]  Jianke Zhu,et al.  A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration , 2014, ECCV Workshops.

[38]  William Moran,et al.  Robust hierarchical multiple hypothesis tracker for multiple object tracking , 2012, ICIP.

[39]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).