Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Object tracking in RGB-thermal (RGB-T) videos is increasingly used in many fields due to the all-weather and all-day working capability of the dual-modality imaging system, as well as the rapid development of low-cost and miniaturized infrared camera technology. However, it is still very challenging to effectively fuse dual-modality information to build a robust RGB-T tracker. In this paper, an RGB-T object tracking algorithm based on a modal-aware attention network and competitive learning (MaCNet) is proposed, which includes a feature extraction network, modal-aware attention network, and classification network. The feature extraction network adopts the form of a two-stream network to extract features from each modality image. The modal-aware attention network integrates the original data, establishes an attention model that characterizes the importance of different feature layers, and then guides the feature fusion to enhance the information interaction between modalities. The classification network constructs a modality-egoistic loss function through three parallel binary classifiers acting on the RGB branch, the thermal infrared branch, and the fusion branch, respectively. Guided by the training strategy of competitive learning, the entire network is fine-tuned in the direction of the optimal fusion of the dual modalities. Extensive experiments on several publicly available RGB-T datasets show that our tracker has superior performance compared to other latest RGB-T and RGB tracking approaches.

[1]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Chenglong Li,et al.  Multi-Adapter RGBT Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[3]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yi Yang,et al.  Camera Style Adaptation for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Wei Wu,et al.  End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[7]  Honggang Zhang,et al.  Deep Attentive Tracking via Reciprocative Learning , 2018, NeurIPS.

[8]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[9]  Masanori Suganuma,et al.  Attention-Based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jin Tang,et al.  Weighted Sparse Representation Regularized Graph Learning for RGB-T Object Tracking , 2017, ACM Multimedia.

[11]  Zhongfei Zhang,et al.  A survey of appearance models in visual object tracking , 2013, ACM Trans. Intell. Syst. Technol..

[12]  Bohyung Han,et al.  Real-Time MDNet , 2018, ECCV.

[13]  Jianke Zhu,et al.  A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration , 2014, ECCV Workshops.

[14]  J. Athanesious,et al.  Systematic Survey on Object Tracking Methods in Video , 2012 .

[15]  Jin Tang,et al.  Real-Time Grayscale-Thermal Tracking via Laplacian Sparse Representation , 2016, MMM.

[16]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[20]  Xiao Wang,et al.  Dense Feature Aggregation and Pruning for RGBT Tracking , 2019, ACM Multimedia.

[21]  Yulong Wang,et al.  Learning Soft-Consistent Correlation Filters for RGB-T Object Tracking , 2018, PRCV.

[22]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[23]  Jiri Matas,et al.  Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[24]  Xiaochun Cao,et al.  Fusing two-stream convolutional neural networks for RGB-T object tracking , 2017, Neurocomputing.

[25]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Liang Lin,et al.  FANet: Quality-Aware Feature Aggregation Network for RGB-T Tracking , 2018, ArXiv.

[27]  Rui Caseiro,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence High-speed Tracking with Kernelized Correlation Filters , 2022 .

[28]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[30]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[32]  Hui Cheng,et al.  Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking , 2016, IEEE Transactions on Image Processing.

[33]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Fuchun Sun,et al.  Fusion tracking in color and infrared images using joint sparse representation , 2012, Science China Information Sciences.

[36]  Han-Ul Kim,et al.  SOWP: Spatially Ordered and Weighted Patch Descriptor for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[38]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Yan Huang,et al.  Cross-Modal Ranking with Soft Consistency and Noisy Labels for Robust RGB-T Tracking , 2018, ECCV.

[40]  Jin Tang,et al.  RGB-T Object Tracking: Benchmark and Baseline , 2018, Pattern Recognit..

[41]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[42]  Li Bai,et al.  Multiple source data fusion via sparse representation for robust visual tracking , 2011, 14th International Conference on Information Fusion.

[43]  Stanley C. Ahalt,et al.  Competitive learning algorithms for vector quantization , 1990, Neural Networks.

[44]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.