论文信息 - Foreground-Background Distribution Modeling Transformer for Visual Object Tracking

Foreground-Background Distribution Modeling Transformer for Visual Object Tracking

Visual object tracking is a fundamental research topic with a broad range of applications. Benefiting from the rapid development of Transformer, pure Transformer trackers have achieved great progress. However, the feature learning of these Transformer-based trackers is easily disturbed by complex backgrounds. To address the above limitations, we propose a novel foreground-background distribution modeling transformer for visual object tracking (F-BDMTrack), including a fore-background agent learning (FBAL) module and a distribution-aware attention (DA2) module in a unified transformer architecture. The proposed F-BDMTrack enjoys several merits. First, the proposed FBAL module can effectively mine fore-background information with designed fore-background agents. Second, the DA2 module can suppress the incorrect interaction between foreground and background by modeling fore-background distribution similarities. Finally, F-BDMTrack can extract discriminative features under ever-changing tracking scenarios for more accurate target state estimation. Extensive experiments show that our F-BDMTrack outperforms previous state-of-the-art trackers on eight tracking benchmarks.

Jianfeng He | Yinchao Ma | Da-Ming Yang | Tianzhu Zhang | Qianjin Yu

[1] A. Ramanan,et al. Transformers in Single Object Tracking: An Experimental Survey , 2023, IEEE Access.

[2] Xuansong Xie,et al. Procontext: Exploring Progressive Context Transformer for Tracking , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Junsong Yuan,et al. AiATrack: Attention in Attention for Transformer Visual Tracking , 2022, ECCV.

[4] D. Wang,et al. Vision-Based Anti-UAV Detection and Tracking , 2022, IEEE Transactions on Intelligent Transportation Systems.

[5] Junqing Yu,et al. Transformer Tracking with Cyclic Shifting Window Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Yunhong Wang,et al. SparseTT: Visual Tracking with Sparse Transformers , 2022, IJCAI.

[7] S. Shan,et al. Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework , 2022, ECCV.

[8] L. Gool,et al. Transforming Model Prediction for Tracking , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Limin Wang,et al. MixFormer: End-to-End Tracking with Iterative Mixed Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] L. Gool,et al. Robust Visual Tracking by Segmentation , 2022, ECCV.

[11] Wanli Ouyang,et al. Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking , 2022, ECCV.

[12] Yue Cao,et al. Correlation-Aware Deep Tracking , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Haibin Ling,et al. SwinTrack: A Simple and Strong Baseline for Transformer Tracking , 2021, NeurIPS.

[14] Jianan Li,et al. SiamSTA: Spatio-Temporal Attention based Siamese Tracker for Tracking UAVs , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[15] Yong Wang,et al. The Ninth Visual Object Tracking VOT2021 Challenge Results , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[16] Lei Ma,et al. Learning to Adversarially Blur Visual Object Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17] S. Sclaroff,et al. Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Jiquan Ngiam,et al. Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Qingjie Liu,et al. STMTrack: Template-free Visual Tracking with Space-time Memory Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Jianlong Fu,et al. Learning Spatio-Temporal Transformer for Visual Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Yonghong Tian,et al. Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Luc Van Gool,et al. Learning Target Candidate Association to Keep Track of What Not to Track , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Huchuan Lu,et al. Transformer Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Quanfu Fan,et al. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Wengang Zhou,et al. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Shaikh Khadar Sharif,et al. Object Detection and Tracking for Community Surveillance using Transfer Learning , 2021, 2021 6th International Conference on Inventive Computation Technologies (ICICT).

[28] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[29] Lin Yuan,et al. LaSOT: A High-quality Large-scale Single Object Tracking Benchmark , 2020, International Journal of Computer Vision.

[30] Linyuan Wang,et al. RPT: Learning Point Set Representation for Siamese Visual Tracking , 2020, ECCV Workshops.

[31] Chinthaka Premachandra,et al. Detection and Tracking of Moving Objects at Road Intersections Using a 360-Degree Camera for Driver Assistance and Automated Driving , 2020, IEEE Access.

[32] Zhipeng Zhang,et al. Ocean: Object-aware Anchor-free Tracking , 2020, ECCV.

[33] Luc Van Gool,et al. Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] L. Gool,et al. Know Your Surroundings: Exploiting Scene Information for Object Tracking , 2020, ECCV.

[35] Shengping Zhang,et al. Siamese Box Adaptive Network for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] R. Urtasun,et al. Dense RepPoints: Representing Visual Objects with Dense Point Sets , 2019, ECCV.

[37] Philip H. S. Torr,et al. Siam R-CNN: Visual Tracking by Re-Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Jiri Matas,et al. D3S – A Discriminative Single Shot Segmentation Tracker , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Ying Cui,et al. SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Gang Yu,et al. SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[41] Ahmad Jalal,et al. Multi-Person Tracking in Smart Surveillance System for Crowd Counting and Normal/Abnormal Events Detection , 2019, 2019 International Conference on Applied and Engineering Mathematics (ICAEM).

[42] Stephen Lin,et al. RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43] L. Gool,et al. Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Silvio Savarese,et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Wei Wu,et al. SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Michael Felsberg,et al. ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Kaiqi Huang,et al. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48] Fan Yang,et al. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Hei Law,et al. CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[50] Yuning Jiang,et al. Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[51] Wei Wu,et al. High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Linsong Cheng,et al. ViTrack: Efficient Tracking on the Edge for Commodity Video Surveillance Systems , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[53] Bernard Ghanem,et al. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[54] Simon Lucey,et al. Need for Speed: A Benchmark for Higher Frame Rate Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Michael Felsberg,et al. ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Bernard Ghanem,et al. A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[58] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[59] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[60] Ming-Hsuan Yang,et al. Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61] Jason Weston,et al. Memory Networks , 2014, ICLR.

[62] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[63] Junliang Xing,et al. Anti-UAV: A Large-Scale Benchmark for Vision-Based UAV Tracking , 2023, IEEE Transactions on Multimedia.

[64] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[65] Bin Li,et al. Object Tracking Based on Meanshift and Particle-Kalman Filter Algorithm with Multi Features , 2019, ICCSCI.