Cross-Parallel Attention and Efficient Match Transformer for Aerial Tracking

Visual object tracking is a key technology that is used in unmanned aerial vehicles (UAVs) to achieve autonomous navigation. In recent years, with the rapid development of deep learning, tracking algorithms based on Siamese neural networks have received widespread attention. However, because of complex and diverse tracking scenarios, as well as limited computational resources, most existing tracking algorithms struggle to ensure real-time stable operation while improving tracking performance. Therefore, studying efficient and fast-tracking frameworks, and enhancing the ability of algorithms to respond to complex scenarios has become crucial. Therefore, this paper proposes a cross-parallel attention and efficient match transformer for aerial tracking (SiamEMT). Firstly, we carefully designed the cross-parallel attention mechanism to encode global feature information and to achieve cross-dimensional interaction and feature correlation aggregation via parallel branches, highlighting feature saliency and reducing global redundancy information, as well as improving the tracking algorithm’s ability to distinguish between targets and backgrounds. Meanwhile, we implemented an efficient match transformer to achieve feature matching. This network utilizes parallel, lightweight, multi-head attention mechanisms to pass template information to the search region features, better matching the global similarity between the template and search regions, and improving the algorithm’s ability to perceive target location and feature information. Experiments on multiple drone public benchmark tests verified the accuracy and robustness of the proposed tracker in drone tracking scenarios. In addition, on the embedded artificial intelligence (AI) platform AGX Xavier, our algorithm achieved real-time tracking speed, indicating that our algorithm can be effectively applied to UAV tracking scenarios.

[1]  Tengyu Ma,et al.  Interframe Saliency Transformer and Lightweight Multidimensional Attention Network for Real-Time Unmanned Aerial Vehicle Tracking , 2023, Remote. Sens..

[2]  Tengyu Ma,et al.  Slight Aware Enhancement Transformer and Multiple Matching Network for Real-Time UAV Tracking , 2023, Remote. Sens..

[3]  Xin Niu,et al.  Visual object tracking: A survey , 2022, Comput. Vis. Image Underst..

[4]  Changhong Fu,et al.  Siamese object tracking for unmanned aerial vehicle: a review and comprehensive analysis , 2022, Artificial Intelligence Review.

[5]  Yunhong Wang,et al.  SparseTT: Visual Tracking with Sparse Transformers , 2022, IJCAI.

[6]  Haibin Ling,et al.  SwinTrack: A Simple and Strong Baseline for Transformer Tracking , 2021, NeurIPS.

[7]  Ran Tao,et al.  Deep Learning for Unmanned Aerial Vehicle-Based Object Detection and Tracking: A survey , 2021, IEEE Geoscience and Remote Sensing Magazine.

[8]  Guangming Shi,et al.  PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network , 2021, AAAI.

[9]  Geng Lu,et al.  Correlation Filters for Unmanned Aerial Vehicle-Based Aerial Tracking: A Review and Experimental Evaluation , 2020, IEEE Geoscience and Remote Sensing Magazine.

[10]  Seyed Mojtaba Marvasti-Zadeh,et al.  Deep Learning for Visual Tracking: A Comprehensive Survey , 2019, IEEE Transactions on Intelligent Transportation Systems.

[11]  Gang Yu,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[12]  Kaiqi Huang,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Dit-Yan Yeung,et al.  Visual Object Tracking for Unmanned Aerial Vehicles: A Benchmark and New Motion Models , 2017, AAAI.

[15]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.