DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs

Existing nighttime unmanned aerial vehicle (UAV) trackers follow an"Enhance-then-Track"architecture - first using a light enhancer to brighten the nighttime video, then employing a daytime tracker to locate the object. This separate enhancement and tracking fails to build an end-to-end trainable vision system. To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts. Without a separate enhancer, DCPT directly encodes anti-dark capabilities into prompts using a darkness clue prompter (DCP). Specifically, DCP iteratively learns emphasizing and undermining projections for darkness clues. It then injects these learned visual prompts into a daytime tracker with fixed parameters across transformer layers. Moreover, a gated feature aggregation mechanism enables adaptive fusion between prompts and between prompts and the base model. Extensive experiments show state-of-the-art performance for DCPT on multiple dark scenario benchmarks. The unified end-to-end learning of enhancement and tracking in DCPT enables a more trainable system. The darkness clue prompting efficiently injects anti-dark knowledge without extra modules. Code is available at https://github.com/bearyi26/DCPT.

[1]  Jungbeom Lee,et al.  Improving Visual Prompt Tuning for Self-supervised Vision Transformers , 2023, International Conference on Machine Learning.

[2]  Huchuan Lu,et al.  Visual Prompt Multi-Modal Tracking , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiangyu Yue,et al.  Prompt Vision Transformer for Domain Generalization , 2022, ArXiv.

[4]  Changhong Fu,et al.  HighlightNet: Highlighting Low-Light Potential Features for Real-Time UAV Tracking , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  B. Schiele,et al.  SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Changhong Fu,et al.  Tracker Meets Night: A Transformer Enhancer for UAV Tracking , 2022, IEEE Robotics and Automation Letters.

[7]  Serge J. Belongie,et al.  Visual Prompt Tuning , 2022, ECCV.

[8]  S. Shan,et al.  Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework , 2022, ECCV.

[9]  D. Paudel,et al.  Unsupervised Domain Adaptation for Nighttime Aerial Tracking , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Bowen Li,et al.  HiFT: Hierarchical Feature Transformer for Aerial Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Changhong Fu,et al.  DarkLighter: Light Up the Darkness for UAV Tracking , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Changhong Fu,et al.  SiamAPN++: Siamese Attentional Aggregation Network for Real-Time UAV Tracking , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Changhong Fu,et al.  ADTrack: Target-Aware Dual Filter Learning for Real-Time Anti-Dark UAV Tracking , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Huchuan Lu,et al.  Transformer Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Junjie Ye,et al.  All-Day Object Tracking for Unmanned Aerial Vehicle , 2021, IEEE Transactions on Mobile Computing.

[17]  Yiming Li,et al.  Siamese Anchor Proposal Network for High-Speed Aerial Tracking , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[18]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[19]  Wan-Chi Siu,et al.  Lightening Network for Low-Light Image Enhancement , 2020, IEEE Transactions on Image Processing.

[20]  Zhipeng Zhang,et al.  Ocean: Object-aware Anchor-free Tracking , 2020, ECCV.

[21]  Luc Van Gool,et al.  Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Gang Yu,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[23]  L. Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Sebastian Scherer,et al.  Towards a Robust Aerial Cinematography Platform: Localizing and Tracking Moving Targets in Unstructured Environments , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Arturo de la Escalera,et al.  An Appearance-Based Tracking Algorithm for Aerial Search and Rescue Purposes † , 2019, Sensors.

[27]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kaiqi Huang,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[31]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Chee Seng Chan,et al.  Getting to Know Low-light Images with The Exclusively Dark Dataset , 2018, Comput. Vis. Image Underst..

[33]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[35]  Gregory Shakhnarovich,et al.  Deep Back-Projection Networks for Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[37]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Zhenwei Shi,et al.  Multi-scale retinex improvement for nighttime image enhancement , 2014 .

[40]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Kunfeng Wang,et al.  Video processing techniques for traffic flow monitoring: A survey , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[43]  Michal Irani,et al.  Improving resolution by image registration , 1991, CVGIP Graph. Model. Image Process..

[44]  Phillip Isola,et al.  Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models , 2022, ArXiv.