Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification

Deep-learning is a cutting edge theory that is being applied to many fields. For vision applications the Convolutional Neural Networks (CNN) are demanding significant accuracy for classification tasks. Numerous hardware accelerators have populated during the last years to improve CPU or GPU based solutions. This technology is commonly prototyped and tested over FPGAs before being considered for ASIC fabrication for mass production. The use of commercial typical cameras (30fps) limits the capabilities of these systems for high speed applications. The use of dynamic vision sensors (DVS) that emulate the behavior of a biological retina is taking an incremental importance to improve this applications due to its nature, where the information is represented by a continuous stream of spikes and the frames to be processed by the CNN are constructed collecting a fixed number of these spikes (called events). The faster an object is, the more events are produced by DVS, so the higher is the equivalent frame rate. Therefore, these DVS utilization allows to compute a frame at the maximum speed a CNN accelerator can offer. In this paper we present a VHDL/HLS description of a pipelined design for FPGA able to collect events from an Address-Event-Representation (AER) DVS retina to obtain a normalized histogram to be used by a particular CNN accelerator, called NullHop. VHDL is used to describe the circuit, and HLS for computation blocks, which are used to perform the normalization of a frame needed for the CNN. Results outperform previous implementations of frames collection and normalization using ARM processors running at 800MHz on a Zynq7100 in both latency and power consumption. A measured 67% speedup factor is presented for a Roshambo CNN real-time experiment running at 160fps peak rate.

[1]  Hoi-Jun Yoo,et al.  UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[4]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[7]  Marian Verhelst,et al.  A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[8]  Alessandro Aimar,et al.  NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[9]  E. Culurciello,et al.  NeuFlow: Dataflow vision processing system-on-a-chip , 2012, 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS).

[10]  Leibo Liu,et al.  A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications , 2017, 2017 Symposium on VLSI Circuits.

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[13]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[14]  Bernabé Linares-Barranco,et al.  On Multiple AER Handshaking Channels Over High-Speed Bit-Serial Bidirectional LVDS Links With Flow-Control and Clock-Correction on Commercial FPGAs for Scalable Neuromorphic Systems , 2017, IEEE Transactions on Biomedical Circuits and Systems.

[15]  Ke Wang,et al.  AI Benchmark: Running Deep Neural Networks on Android Smartphones , 2018, ECCV Workshops.

[16]  Tobi Delbrück,et al.  A 5 Meps $100 USB2.0 Address-Event Monitor-Sequencer Interface , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[17]  Ricardo Tapiador-Morales,et al.  Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator , 2018, 2018 IEEE 18th International Conference on Nanotechnology (IEEE-NANO).

[18]  Hoi-Jun Yoo,et al.  14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[19]  Daniel Gutierrez-Galan,et al.  A 20Mevps/32Mev event-based USB framework for neuromorphic systems debugging , 2016, 2016 Second International Conference on Event-based Control, Communication, and Signal Processing (EBCCSP).

[20]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[21]  Tobi Delbrück,et al.  A USB3.0 FPGA event-based filtering and tracking framework for dynamic vision sensors , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[22]  Jun-Seok Park,et al.  14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[23]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).