LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism

High inference latency seriously limits the deployment of DNNs in real-time domains such as autonomous driving, robotic control, and many others. To address this emerging challenge, researchers have proposed approximate DNNs with reduced precision, e.g., Binarized Neural Networks (BNNs). While BNNs can be built to have little loss in accuracy, latency reduction still has much room for improvement. In this paper, we propose a single-FPGA-based BNN accelerator that achieves microsecond-level ultra-low-latency inference of ImageNet, LP-BNN. We obtain this performance via several design optimizations. First, we optimize the network structure by removing Batch Normalization (BN) functions which leads to significant latency in BNNs without any loss on accuracy. Second, we propose a parameterized architecture which is based on layer parallelism and supports nearly perfect load balancing for various types of BNNs. Third, we fuse all the convolution layers and the first fully connected layer. We process them in parallel through fine-grained inter-layer pipelining. With our proposed accelerator, the inference of binarized AlexNet, VGGNet, and ResNet are completed within 21.5us, 335us, and 67.8us respectively, with no loss in accuracy as compared with other BNN implementations.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[3]  Akash Kumar,et al.  Dataflow-Based Mapping of Spiking Neural Networks on Neuromorphic Hardware , 2018, ACM Great Lakes Symposium on VLSI.

[4]  Martin C. Herbordt,et al.  O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning , 2019, ICS.

[5]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[6]  Gang Hua,et al.  How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[7]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[8]  Chen Yang,et al.  Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  Chen Yang,et al.  FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[10]  Luca Benini,et al.  YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Jiangming Jin,et al.  BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[12]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[13]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[14]  Albert Cohen,et al.  Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Martin C. Herbordt,et al.  A Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Work and Weight Load Balancing , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[17]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[18]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[19]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[20]  Eriko Nurvitadhi,et al.  Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[21]  Wayne Luk,et al.  FP-BNN: Binarized neural network on FPGA , 2018, Neurocomputing.

[22]  Martin C. Herbordt,et al.  A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing , 2019, ArXiv.