On-Device Event Filtering with Binary Neural Networks for Pedestrian Detection Using Neuromorphic Vision Sensors

In this work, we present a hardware-efficient architecture for pedestrian detection with neuromorphic Dynamic Vision Sensors (DVSs), asynchronous camera sensors that report discrete changes in light intensity. These imaging sensors have many advantages compared to traditional frame-based cameras, such as increased dynamic range, lower bandwidth requirements, and higher sampling frequency with lower power consumption. Our architecture is composed of two main components: an event filtering stage to denoise the input image stream followed by a low-complexity neural network. For the first stage, we use a novel point-process filter (PPF) with an adaptive temporal windowing scheme that enhances classification accuracy. The second stage implements a hardware-efficient Binary Neural Network (BNN) for classification. To demonstrate the reduction in complexity achieved by our architecture, we showcase a Field-Programmable Gate Array (FPGA) implementation of the entire system which obtains a 86& reduction in latency compared to current neural network floating-point architectures.

[1]  Tobi Delbrück,et al.  Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification , 2019, ICONS.

[2]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[3]  Luca Benini,et al.  XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Yu Cao,et al.  Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Luca Benini,et al.  Always-ON visual node with a hardware-software event-based binarized neural network inference engine , 2018, CF.

[6]  Haojin Yang,et al.  Training Accurate Binary Neural Networks from Scratch , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[7]  Long Chen,et al.  Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[8]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Garrick Orchard,et al.  HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Vladlen Koltun,et al.  Events-To-Video: Bringing Modern Computer Vision to Event Cameras , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[12]  Jan Kautz,et al.  Deep Learning with Energy-efficient Binary Gradient Cameras , 2016, ArXiv.

[13]  Ashok Veeraraghavan,et al.  Direct face detection and video reconstruction from event cameras , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Feng Shi,et al.  Adaptive Temporal Pooling for Object Detection using Dynamic Vision Sensor , 2017, BMVC.

[15]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Luca Benini,et al.  YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[19]  Nitish V. Thakor,et al.  Real-time object recognition and orientation estimation using an event-based camera and CNN , 2014, 2014 IEEE Biomedical Circuits and Systems Conference (BioCAS) Proceedings.

[20]  Jishen Zhao,et al.  Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA , 2018, FPGA.

[21]  Ryad Benosman,et al.  Spatial and Temporal Downsampling in Event-Based Visual Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Sungho Kim,et al.  4.1 A 640×480 dynamic vision sensor with a 9µm pixel and 300Meps address-event representation , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[23]  Alois Knoll,et al.  Mixed Frame-/Event-Driven Fast Pedestrian Detection , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Lei Feng,et al.  An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution , 2019, Electronics.

[25]  Bernabé Linares-Barranco,et al.  Mapping from Frame-Driven to Frame-Free Event-Driven Vision Systems by Low-Rate Rate Coding and Coincidence Processing--Application to Feedforward ConvNets , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.