CEAFFOD: Cross-Ensemble Attention-based Feature Fusion Architecture Towards a Robust and Real-time UAV-based Object Detection in Complex Scenarios

Deploying object detectors in embedded devices such as unmanned aerial vehicles (UAVs) comes with many challenges. This is due to both the UAV itself having low embedded resources in terms of computation and memory, and also due to the nature of the captured visual data with the variations in objects' scale, orientation, density, viewpoint, distribution, shape, context and others. It is crucial for the object detector to be robust with high accuracy, real-time with fast inference and light-weight to be applicable. Inspired by YOLO architecture, we propose a novel single-stage detection architecture. Our contributions are, first, feature fusion spatial pyramid pooling (FFSPP) block that applies attention-based feature fusion across both time and space utilizing the information of subsequent frames and scales in an efficient manner. Secondly, we introduce a multi-dilated attention-based cross-stage partial connection (MDACSP) block that helps in increasing the receptive field and producing per-channel modulation weights after aggregating the feature maps across their spatial domain. Third, scaled feature fusion head (SFFH) fuses both the FFSPP block features and the connected MDACSP block features specific for this head. For a more robust result across different scenarios, we perform cross-ensembling with three of the top UAV/traffic surveillance datasets: UAVDT, UA-DETRAC and VisDrone. Our ablation study shows how every contribution improves over the baseline. Our approach yielded the state-of-the-art results in all the aforementioned datasets achieving 89.3% mAP, 93.5% mAP, and 42.9% mAP respectively. Testing the model performance on NVIDIA Jetson Xavier NX board shows a desirable balance between the inference time and the memory cost. We also show qualitatively the model robustness and efficiency across the diverse complex scenarios of these datasets. We hope this work facilitates the advancement of the UAV-based perception in such crucial industrial applications.

[1]  Hyun-Ki Jung,et al.  Improved YOLOv5: Efficient Object Detection Using Drone Images under Various Conditions , 2022, Applied Sciences.

[2]  Huan Luo,et al.  YOLOv3_ReSAM: A Small-Target Detection Method , 2022, Electronics.

[3]  N. Ravi,et al.  Real-Time Embedded Implementation of Improved Object Detector for Resource-Constrained Devices , 2022, Journal of Low Power Electronics and Applications.

[4]  Qiong Liu,et al.  Addressing scale imbalance for small object detection with dense detector , 2021, Neurocomputing.

[5]  A. K. Sangaiah,et al.  A Review on Object Detection in Unmanned Aerial Vehicle Surveillance , 2021, International Journal of Cognitive Computing in Engineering.

[6]  Xiaozheng He,et al.  RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring , 2021, Applied Intelligence.

[7]  Ran Tao,et al.  Deep Learning for Unmanned Aerial Vehicle-Based Object Detection and Tracking: A survey , 2021, IEEE Geoscience and Remote Sensing Magazine.

[8]  Yunhao Du,et al.  GIAOTracker: A comprehensive framework for MCMOT with global information and optimizing strategies in VisDrone 2021 , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[9]  L. Jiao,et al.  ViT-YOLO:Transformer-Based YOLO for Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[10]  L. Gool,et al.  VisDrone-DET2021: The Vision Meets Drone Object detection Challenge Results , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[11]  Guillaume-Alexandre Bilodeau,et al.  FFAVOD: Feature Fusion Architecture for Video Object Detection , 2021, Pattern Recognit. Lett..

[12]  Qi Zhao,et al.  TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[13]  Kemiao Huang,et al.  Joint Multi-Object Detection and Tracking with Camera-LiDAR Fusion for Autonomous Driving , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Mohsen Guizani,et al.  AI-Enabled Object Detection in UAVs: Challenges, Design Choices, and Research Directions , 2021, IEEE Network.

[15]  Nadia Kanwal,et al.  A Survey of Modern Deep Learning based Object Detection Models , 2021, Digit. Signal Process..

[16]  M. A. Ganaie,et al.  Ensemble deep learning: A review , 2021, Eng. Appl. Artif. Intell..

[17]  L. Jorge,et al.  A Review on Deep Learning in UAV Remote Sensing , 2021, Int. J. Appl. Earth Obs. Geoinformation.

[18]  Akashdeep Sharma,et al.  Deep learning-based object detection in low-altitude UAV datasets: A survey , 2020, Image Vis. Comput..

[19]  Hyo Jong Lee,et al.  Lightweight Stacked Hourglass Network for Human Pose Estimation , 2020, Applied Sciences.

[20]  Holger Voos,et al.  A Survey of Computer Vision Methods for 2D Object Detection from Unmanned Aerial Vehicles , 2020, J. Imaging.

[21]  Yuanzhou Yao,et al.  Multi-Scale Region-based Fully Convolutional Networks , 2020, 2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS).

[22]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[23]  Mingjie Liu,et al.  UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective , 2020, Sensors.

[24]  Guillaume-Alexandre Bilodeau,et al.  SpotNet: Self-Attention Multi-Task Network for Object Detection , 2020, 2020 17th Conference on Computer and Robot Vision (CRV).

[25]  Nicu Sebe,et al.  The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline , 2019, International Journal of Computer Vision.

[26]  Jun-Wei Hsieh,et al.  CSPNet: A New Backbone that can Enhance Learning Capability of CNN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Fahad Shahbaz Khan,et al.  Learning Rich Features at High-Speed for Single-Shot Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Anastasios Tefas,et al.  Embedded UAV Real-Time Visual Object Detection and Tracking , 2019, 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR).

[29]  Lei Zhang,et al.  Foreground Gating and Background Refining Network for Surveillance Object Detection , 2019, IEEE Transactions on Image Processing.

[30]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Jian Sun,et al.  ExFuse: Enhancing Feature Fusion for Semantic Segmentation , 2018, ECCV.

[32]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[33]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Bernt Schiele,et al.  Learning Non-maximum Suppression , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[37]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Serge J. Belongie,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[40]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[41]  Ming-Hsuan Yang,et al.  UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[42]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[47]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Thomas Blaschke,et al.  Object-Based Image Analysis , 2008 .