Foreground-Background Distribution Modeling Transformer for Visual Object Tracking

Visual object tracking is a fundamental research topic with a broad range of applications. Benefiting from the rapid development of Transformer, pure Transformer trackers have achieved great progress. However, the feature learning of these Transformer-based trackers is easily disturbed by complex backgrounds. To address the above limitations, we propose a novel foreground-background distribution modeling transformer for visual object tracking (F-BDMTrack), including a fore-background agent learning (FBAL) module and a distribution-aware attention (DA2) module in a unified transformer architecture. The proposed F-BDMTrack enjoys several merits. First, the proposed FBAL module can effectively mine fore-background information with designed fore-background agents. Second, the DA2 module can suppress the incorrect interaction between foreground and background by modeling fore-background distribution similarities. Finally, F-BDMTrack can extract discriminative features under ever-changing tracking scenarios for more accurate target state estimation. Extensive experiments show that our F-BDMTrack outperforms previous state-of-the-art trackers on eight tracking benchmarks.

[1]  A. Ramanan,et al.  Transformers in Single Object Tracking: An Experimental Survey , 2023, IEEE Access.

[2]  Xuansong Xie,et al.  Procontext: Exploring Progressive Context Transformer for Tracking , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Junsong Yuan,et al.  AiATrack: Attention in Attention for Transformer Visual Tracking , 2022, ECCV.

[4]  D. Wang,et al.  Vision-Based Anti-UAV Detection and Tracking , 2022, IEEE Transactions on Intelligent Transportation Systems.

[5]  Junqing Yu,et al.  Transformer Tracking with Cyclic Shifting Window Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yunhong Wang,et al.  SparseTT: Visual Tracking with Sparse Transformers , 2022, IJCAI.

[7]  S. Shan,et al.  Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework , 2022, ECCV.

[8]  L. Gool,et al.  Transforming Model Prediction for Tracking , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Limin Wang,et al.  MixFormer: End-to-End Tracking with Iterative Mixed Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  L. Gool,et al.  Robust Visual Tracking by Segmentation , 2022, ECCV.

[11]  Wanli Ouyang,et al.  Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking , 2022, ECCV.

[12]  Yue Cao,et al.  Correlation-Aware Deep Tracking , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Haibin Ling,et al.  SwinTrack: A Simple and Strong Baseline for Transformer Tracking , 2021, NeurIPS.

[14]  Jianan Li,et al.  SiamSTA: Spatio-Temporal Attention based Siamese Tracker for Tracking UAVs , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[15]  Yong Wang,et al.  The Ninth Visual Object Tracking VOT2021 Challenge Results , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[16]  Lei Ma,et al.  Learning to Adversarially Blur Visual Object Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  S. Sclaroff,et al.  Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jiquan Ngiam,et al.  Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Qingjie Liu,et al.  STMTrack: Template-free Visual Tracking with Space-time Memory Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jianlong Fu,et al.  Learning Spatio-Temporal Transformer for Visual Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Yonghong Tian,et al.  Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Luc Van Gool,et al.  Learning Target Candidate Association to Keep Track of What Not to Track , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Huchuan Lu,et al.  Transformer Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  N. Codella,et al.  CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Quanfu Fan,et al.  CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Wengang Zhou,et al.  Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Shaikh Khadar Sharif,et al.  Object Detection and Tracking for Community Surveillance using Transfer Learning , 2021, 2021 6th International Conference on Inventive Computation Technologies (ICICT).

[28]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[29]  Lin Yuan,et al.  LaSOT: A High-quality Large-scale Single Object Tracking Benchmark , 2020, International Journal of Computer Vision.

[30]  Linyuan Wang,et al.  RPT: Learning Point Set Representation for Siamese Visual Tracking , 2020, ECCV Workshops.

[31]  Chinthaka Premachandra,et al.  Detection and Tracking of Moving Objects at Road Intersections Using a 360-Degree Camera for Driver Assistance and Automated Driving , 2020, IEEE Access.

[32]  Zhipeng Zhang,et al.  Ocean: Object-aware Anchor-free Tracking , 2020, ECCV.

[33]  Luc Van Gool,et al.  Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  L. Gool,et al.  Know Your Surroundings: Exploiting Scene Information for Object Tracking , 2020, ECCV.

[35]  Shengping Zhang,et al.  Siamese Box Adaptive Network for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  R. Urtasun,et al.  Dense RepPoints: Representing Visual Objects with Dense Point Sets , 2019, ECCV.

[37]  Philip H. S. Torr,et al.  Siam R-CNN: Visual Tracking by Re-Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jiri Matas,et al.  D3S – A Discriminative Single Shot Segmentation Tracker , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ying Cui,et al.  SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Gang Yu,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[41]  Ahmad Jalal,et al.  Multi-Person Tracking in Smart Surveillance System for Crowd Counting and Normal/Abnormal Events Detection , 2019, 2019 International Conference on Applied and Engineering Mathematics (ICAEM).

[42]  Stephen Lin,et al.  RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  L. Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Kaiqi Huang,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[50]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[51]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Linsong Cheng,et al.  ViTrack: Efficient Tracking on the Edge for Commodity Video Surveillance Systems , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[53]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[54]  Simon Lucey,et al.  Need for Speed: A Benchmark for Higher Frame Rate Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[58]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[59]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[60]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[62]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[63]  Junliang Xing,et al.  Anti-UAV: A Large-Scale Benchmark for Vision-Based UAV Tracking , 2023, IEEE Transactions on Multimedia.

[64]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Bin Li,et al.  Object Tracking Based on Meanshift and Particle-Kalman Filter Algorithm with Multi Features , 2019, ICCSCI.