论文信息 - Neuromorphic Camera Denoising using Graph Neural Network-driven Transformers

Neuromorphic Camera Denoising using Graph Neural Network-driven Transformers

Neuromorphic vision is a bio-inspired technology that has triggered a paradigm shift in the computer-vision community and is serving as a key-enabler for a wide range of applications. This technology has offered significant advantages including reduced power consumption, reduced processing needs, and communication speed-ups. However, neuromorphic cameras suffer from significant amounts of measurement noise. This noise deteriorates the performance of neuromorphic event-based perception and navigation algorithms. In this paper, we propose a novel noise filtration algorithm to eliminate events which do not represent real log-intensity variations in the observed scene. We employ a Graph Neural Network (GNN)-driven transformer algorithm, called GNN-Transformer, to classify every active event pixel in the raw stream into real-log intensity variation or noise. Within the GNN, a message-passing framework, referred to as EventConv, is carried out to reflect the spatiotemporal correlation among the events, while preserving their asynchronous nature. We also introduce the Known-object Ground-Truth Labeling (KoGTL) approach for generating approximate ground truth labels of event streams under various illumination conditions. KoGTL is used to generate labeled datasets, from experiments recorded in challenging lighting conditions, including moon light. These datasets are used to train and extensively test our proposed algorithm. When tested on unseen datasets, the proposed algorithm outperforms state-of-the-art methods by 12.0% in terms of filtration accuracy. Additional tests are also conducted on publicly available datasets (ETH-Zurich Color-DAVIS346 datasets) to demonstrate the generalization capabilities of the proposed algorithm in the presence of illumination variations and different motion dynamics. Compared to state-of-the-art solutions, qualitative results verified the superior capability of the proposed algorithm to eliminate noise while preserving meaningful events in the scene.

[1] Philip S. Yu,et al. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[2] Fahad Shahbaz Khan,et al. Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[3] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Per Bergström,et al. Robust registration of point sets using iteratively reweighted least squares , 2014, Computational Optimization and Applications.

[6] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[7] Davide Scaramuzza,et al. Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-Speed Scenarios , 2017, IEEE Robotics and Automation Letters.

[8] Yang Feng,et al. Event Density Based Denoising Method for Dynamic Vision Sensor , 2020, Applied Sciences.

[9] Tobi Delbrück,et al. A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[11] Vladlen Koltun,et al. High Speed and High Dynamic Range Video with an Event Camera , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Zhiyuan Liu,et al. Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[13] Ryan Kastner,et al. $O(N)$O(N)-Space Spatiotemporal Filter for Reducing Noise in Neuromorphic Vision Sensors , 2021, IEEE Trans. Emerg. Top. Comput..

[14] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15] Chiara Bartolozzi,et al. Event-driven ball detection and gaze fixation in clutter , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16] Yahya Zweiri,et al. Neuromorphic Eye-in-Hand Visual Servoing , 2020, IEEE Access.

[17] Ben Glocker,et al. TeTrIS: Template Transformer Networks for Image Segmentation With Shape Priors , 2019, IEEE Transactions on Medical Imaging.

[18] Tobi Delbrück,et al. EV-IMO: Motion Segmentation Dataset and Learning Pipeline for Event Cameras , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19] Yahya Zweiri,et al. Pose-Graph Neural Network Classifier for Global Optimality Prediction in 2D SLAM , 2021, IEEE Access.

[20] Dongming Gan,et al. Real-time grasping strategies using event camera , 2021, Journal of Intelligent Manufacturing.

[21] Mohamed Mansour. A Message-Passing Algorithm for Graph Isomorphism , 2017, ArXiv.

[22] Katikapalli Subramanyam Kalyan,et al. AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing , 2021, ArXiv.

[23] Lei Wang,et al. HashHeat: An O(C) Complexity Hashing-based Filter for Dynamic Vision Sensor , 2020, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC).

[24] Nick Barnes,et al. CED: Color Event Camera Dataset , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25] Lei Zhang,et al. Bridging the Gap between Spatial and Spectral Domains: A Survey on Graph Neural Networks , 2020, ArXiv.

[26] Vijayan Asari,et al. Event Probability Mask (EPM) and Event Denoising Convolutional Neural Network (EDnCNN) for Neuromorphic Cameras , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Athanasios Mouchtaris,et al. End-to-End Multi-Channel Transformer for Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28] Tobi Delbrück,et al. Design of a spatiotemporal correlation filter for event-based sensors , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[29] Hong Huang,et al. Deep Feature Aggregation Framework Driven by Graph Convolutional Network for Scene Classification in Remote Sensing , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[30] Tobias Delbrück,et al. Frame-free dynamic digital vision , 2008 .

[31] Chiara Bartolozzi,et al. Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Hao Wang,et al. Rethinking Knowledge Graph Propagation for Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Kostas Daniilidis,et al. EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras , 2018, Robotics: Science and Systems.

[34] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[35] Garrick Orchard,et al. A Noise Filtering Algorithm for Event-Based Asynchronous Change Detection Image Sensors on TrueNorth and Its Implementation on TrueNorth , 2018, Front. Neurosci..

[36] Xiaofan Chen,et al. A Noise Filter for Dynamic Vision Sensors using Self-adjusting Threshold , 2020 .

[37] Davide Scaramuzza,et al. EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time , 2017, International Journal of Computer Vision.

[38] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Stephan Schraml,et al. Embedded Vehicle Speed Estimation System Using an Asynchronous Temporal Contrast Vision Sensor , 2007, EURASIP J. Embed. Syst..

[40] S. Sagar Imambi,et al. PyTorch , 2021, Programming with TensorFlow.

[41] Mohan S. Kankanhalli,et al. Toward Region-Aware Attention Learning for Scene Graph Generation , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.