论文信息 - Global-local Feature Aggregation for Event-based Object Detection on EventKITTI

Global-local Feature Aggregation for Event-based Object Detection on EventKITTI

Event sequence conveys asynchronous pixel-wise visual information in a low power and high temporal resolution manner, which enables more robust perception under challenging conditions, e.g., fast motion. Two main factors limit the development of event-based object detection in traffic scenes: lack of high-quality datasets and effective event-based algorithms. To solve the first problem, we propose a simulated event-based detection dataset named EventKITTI, which incorporates the novel event modality information into a mixed two-level (i.e. object-level and video-level) detection dataset under traffic scenarios. EventKITTI possesses the high-quality event stream and the largest number of categories at microsecond temporal resolution and 1242×375 spatial resolution, exceeding existing datasets. As for the second problem, existing algorithms rely on CNN-based, spiking or graph architectures to capture local features of moving objects, leading to poor performance in objects with incomplete contours. Hence, we propose event-based object detectors named GFA-Net and CGFA-Net. To enhance the global-local learning ability in the spatial dimension, GFA-Net introduces transformer with edge-based position encoding and multi-scale feature fusion to detect objects on static frame. Furthermore, CGFA-Net optimizes edge-based position encoding with close-loop learning based on previous detected heatmap, which aggregates temporal global features across event frames. The proposed event-based object detectors achieve the best speed-accuracy trade-off on EventKITTI, approaching an 81.3% MAP at 33.0 FPS on object-level detection dataset and a 64.5% MAP at 30.3 FPS on video-level detection dataset.

G. Chen | Hu Cao | Zichen Liang | Chu Yang | Zikai Zhang

[1] J. Conradt,et al. Neuromorphic Vision-Based Fall Localization in Event Streams With Temporal–Spatial Attention Weighted Network , 2022, IEEE Transactions on Cybernetics.

[2] Alois Knoll,et al. NeuroIV: Neuromorphic Vision Meets Intelligent Vehicle Towards Safe Driving With a New Database and Baseline Evaluations , 2020, IEEE Transactions on Intelligent Transportation Systems.

[3] Yingbai Hu,et al. NeuroGrasp: Multimodal Neural Network With Euler Region Regression for Neuromorphic Vision-Based Grasp Pose Estimation , 2022, IEEE Transactions on Instrumentation and Measurement.

[4] A. Knoll,et al. Fusion-Based Feature Attention Gate Component for Vehicle Detection Based on Event Camera , 2021, IEEE Sensors Journal.

[5] H. Bao,et al. Graph-based Asynchronous Event Processing for Rapid Object Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6] Nenghai Yu,et al. Temporal ROI Align for Video Object Recognition , 2021, AAAI.

[7] Etienne Perot,et al. Learning to Detect Objects with a 1 Megapixel Event Camera , 2020, NeurIPS.

[8] Huajin Tang,et al. Event-Based Neuromorphic Vision for Autonomous Driving: A Paradigm Shift for Bio-Inspired Visual Sensing and Perception , 2020, IEEE Signal Processing Magazine.

[9] Yiannis Aloimonos,et al. Learning Visual Motion Segmentation Using Event Surfaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[11] Vladlen Koltun,et al. Tracking Objects as Points , 2020, ECCV.

[12] Davide Scaramuzza,et al. Event-based Asynchronous Sparse Convolutional Networks , 2020, ECCV.

[13] Etienne Perot,et al. A Large Scale Event-based Detection Dataset for Automotive , 2020, ArXiv.

[14] Davide Scaramuzza,et al. Video to Events: Recycling Video Datasets for Event Cameras , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Shifeng Zhang,et al. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Sungroh Yoon,et al. Spiking-YOLO: Spiking Neural Network for Energy-Efficient Object Detection , 2019, AAAI.

[17] Zhaoxiang Zhang,et al. Sequence Level Semantics Aggregation for Video Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[19] Xingyi Zhou,et al. Objects as Points , 2019, ArXiv.

[20] Alois Knoll,et al. Multi-Cue Event Information Fusion for Pedestrian Detection With Neuromorphic Vision Sensors , 2019, Front. Neurorobot..

[21] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Xingyi Zhou,et al. Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Hei Law,et al. CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[24] Matteo Matteucci,et al. Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25] Davide Scaramuzza,et al. ESIM: an Open Event Camera Simulator , 2018, CoRL.

[26] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[27] Tobi Delbrück,et al. DDD17: End-To-End DAVIS Driving Dataset , 2017, ArXiv.

[28] Yujie Wang,et al. Flow-Guided Feature Aggregation for Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29] Yichen Wei,et al. Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Chiara Bartolozzi,et al. Event-driven embodied system for feature extraction and object recognition in robotic applications , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[33] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.