Global-local Feature Aggregation for Event-based Object Detection on EventKITTI

Event sequence conveys asynchronous pixel-wise visual information in a low power and high temporal resolution manner, which enables more robust perception under challenging conditions, e.g., fast motion. Two main factors limit the development of event-based object detection in traffic scenes: lack of high-quality datasets and effective event-based algorithms. To solve the first problem, we propose a simulated event-based detection dataset named EventKITTI, which incorporates the novel event modality information into a mixed two-level (i.e. object-level and video-level) detection dataset under traffic scenarios. EventKITTI possesses the high-quality event stream and the largest number of categories at microsecond temporal resolution and 1242×375 spatial resolution, exceeding existing datasets. As for the second problem, existing algorithms rely on CNN-based, spiking or graph architectures to capture local features of moving objects, leading to poor performance in objects with incomplete contours. Hence, we propose event-based object detectors named GFA-Net and CGFA-Net. To enhance the global-local learning ability in the spatial dimension, GFA-Net introduces transformer with edge-based position encoding and multi-scale feature fusion to detect objects on static frame. Furthermore, CGFA-Net optimizes edge-based position encoding with close-loop learning based on previous detected heatmap, which aggregates temporal global features across event frames. The proposed event-based object detectors achieve the best speed-accuracy trade-off on EventKITTI, approaching an 81.3% MAP at 33.0 FPS on object-level detection dataset and a 64.5% MAP at 30.3 FPS on video-level detection dataset.

[1]  J. Conradt,et al.  Neuromorphic Vision-Based Fall Localization in Event Streams With Temporal–Spatial Attention Weighted Network , 2022, IEEE Transactions on Cybernetics.

[2]  Alois Knoll,et al.  NeuroIV: Neuromorphic Vision Meets Intelligent Vehicle Towards Safe Driving With a New Database and Baseline Evaluations , 2020, IEEE Transactions on Intelligent Transportation Systems.

[3]  Yingbai Hu,et al.  NeuroGrasp: Multimodal Neural Network With Euler Region Regression for Neuromorphic Vision-Based Grasp Pose Estimation , 2022, IEEE Transactions on Instrumentation and Measurement.

[4]  A. Knoll,et al.  Fusion-Based Feature Attention Gate Component for Vehicle Detection Based on Event Camera , 2021, IEEE Sensors Journal.

[5]  H. Bao,et al.  Graph-based Asynchronous Event Processing for Rapid Object Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Nenghai Yu,et al.  Temporal ROI Align for Video Object Recognition , 2021, AAAI.

[7]  Etienne Perot,et al.  Learning to Detect Objects with a 1 Megapixel Event Camera , 2020, NeurIPS.

[8]  Huajin Tang,et al.  Event-Based Neuromorphic Vision for Autonomous Driving: A Paradigm Shift for Bio-Inspired Visual Sensing and Perception , 2020, IEEE Signal Processing Magazine.

[9]  Yiannis Aloimonos,et al.  Learning Visual Motion Segmentation Using Event Surfaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[11]  Vladlen Koltun,et al.  Tracking Objects as Points , 2020, ECCV.

[12]  Davide Scaramuzza,et al.  Event-based Asynchronous Sparse Convolutional Networks , 2020, ECCV.

[13]  Etienne Perot,et al.  A Large Scale Event-based Detection Dataset for Automotive , 2020, ArXiv.

[14]  Davide Scaramuzza,et al.  Video to Events: Recycling Video Datasets for Event Cameras , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shifeng Zhang,et al.  Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Sungroh Yoon,et al.  Spiking-YOLO: Spiking Neural Network for Energy-Efficient Object Detection , 2019, AAAI.

[17]  Zhaoxiang Zhang,et al.  Sequence Level Semantics Aggregation for Video Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[19]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[20]  Alois Knoll,et al.  Multi-Cue Event Information Fusion for Pedestrian Detection With Neuromorphic Vision Sensors , 2019, Front. Neurorobot..

[21]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[24]  Matteo Matteucci,et al.  Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Davide Scaramuzza,et al.  ESIM: an Open Event Camera Simulator , 2018, CoRL.

[26]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[27]  Tobi Delbrück,et al.  DDD17: End-To-End DAVIS Driving Dataset , 2017, ArXiv.

[28]  Yujie Wang,et al.  Flow-Guided Feature Aggregation for Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Yichen Wei,et al.  Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Chiara Bartolozzi,et al.  Event-driven embodied system for feature extraction and object recognition in robotic applications , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[33]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.