Bayesian Dumbbell Diffusion Model for RGBT Object Tracking With Enriched Priors

RGBT tracking can be accomplished by constructing Bayesian estimators that incorporate fusion prior distributions for the visible (RGB) and thermal (T) modalities. Such estimators enable the computation of a posterior distribution for the variables of interest to locate the target. Incorporating rich prior information can improve the performance of predictors. However, current RGBT trackers face limited fusion prior data. To mitigate this issue, we propose a novel tracker, BD$^{2}$ Track, which employs a diffusion model. Firstly, this letter introduces a dumbbell diffusion model, and employ convolution networks and the dumbbell model to derive the fusion feature prior information from various index frames in the same tracking video sequence. Secondly, we propose a plug-and-play channel augmented joint learning strategy to derive the images prior distribution. This strategy not only homogeneously generates modality-relevant prior information but also increases the distance between positive and negative samples within the modality, while reducing the distance between modalities during fusion. Results demonstrate promising performance in the GTOT, RGBT234, LasHeR, and VTUAV-ST datasets, surpassing other state-of-the-art trackers.

[1]  Jin Tang,et al.  Attribute-Based Progressive Fusion Network for RGBT Tracking , 2022, AAAI.

[2]  Huchuan Lu,et al.  Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Pengyu Zhang,et al.  Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking , 2021, International Journal of Computer Vision.

[4]  Jin Tang,et al.  LasHeR: A Large-Scale High-Diversity Benchmark for RGBT Tracking , 2021, IEEE Transactions on Image Processing.

[5]  David J. Fleet,et al.  Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Liang Wang,et al.  Duality-Gated Mutual Condition Network for RGBT Tracking , 2020, IEEE transactions on neural networks and learning systems.

[7]  Jin Tang,et al.  RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss , 2020, IEEE Transactions on Image Processing.

[8]  Jin Tang,et al.  Challenge-Aware RGBT Tracking , 2020, ECCV.

[9]  Dong Wang,et al.  Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking , 2020, IEEE Transactions on Image Processing.

[10]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[11]  Jin Tang,et al.  Quality-Aware Feature Aggregation Network for Robust RGBT Tracking , 2020, IEEE Transactions on Intelligent Vehicles.

[12]  Hui Zhang,et al.  Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning , 2020, Sensors.

[13]  Fahad Shahbaz Khan,et al.  Multi-Modal Fusion for End-to-End RGB-T Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[14]  Chenglong Li,et al.  Multi-Adapter RGBT Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[15]  Jin Tang,et al.  RGB-T Object Tracking: Benchmark and Baseline , 2018, Pattern Recognit..

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Vadim Sokolov,et al.  Deep Learning: A Bayesian Perspective , 2017, ArXiv.

[18]  Hui Cheng,et al.  Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking , 2016, IEEE Transactions on Image Processing.

[19]  Zhuling Qiu,et al.  Perceiving Temporal Environment for Correlation Filters in Real-Time UAV Tracking , 2022, IEEE Signal Processing Letters.

[20]  Shunli Zhang,et al.  Using Segmentation With Multi-Scale Selective Kernel for Visual Object Tracking , 2022, IEEE Signal Processing Letters.

[21]  Chong Wang,et al.  Cross-Epoch Learning for Weakly Supervised Anomaly Detection in Surveillance Videos , 2021, IEEE Signal Processing Letters.