CQNN: a CGRA-based QNN Framework

Quantized Neural Networks (QNNs) have drawn tremendous attention since - when compared with Convolution Neural Networks (CNNs) - they often dramatically reduce computation, communication, and storage demands with negligible loss in accuracy. To find an optimal balance between performance and accuracy, developers use different data-widths for different layers and channels. Given this large parameter space, it is challenging to design a QNN accelerator which is generally efficient for various and flexible model configurations. In this paper we propose CQNN, a novel Coarse-Grained Reconfigurable Architecture-based (CGRA) QNN acceleration framework. CQNN has a large number of basic components for binary functions. By programming CQNN at runtime according to the target QNN model, these basic components are integrated to support QNN functions with any data-width and hyperparameter requirements. The result is an optimal QNN for the target model. The framework includes compiler, hardware design, simulator, and RTL generator. Experimental results show CQNNs can complete the inference of AlexNet and VGG-16 within 0.13ms and 2.63ms, respectively. We demonstrate the design on an FPGA platform; however, this is only for showcasing the method: the approach does not rely on any FPGA-specific features and can thus be implemented as ASIC as well.

[1]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[2]  Gang Hua,et al.  How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[3]  Wayne Luk,et al.  FP-BNN: Binarized neural network on FPGA , 2018, Neurocomputing.

[4]  Eunhyeok Park,et al.  Value-aware Quantization for Training and Inference of Neural Networks , 2018, ECCV.

[5]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[6]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[7]  Wei Wu,et al.  O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference , 2021, IEEE Transactions on Parallel and Distributed Systems.

[8]  Vijay Janapa Reddi,et al.  Quantized Neural Network Inference with Precision Batching , 2020, ArXiv.

[9]  Martin C. Herbordt,et al.  BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets , 2019, SC.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Martin C. Herbordt,et al.  O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning , 2019, ICS.

[12]  Chen Yang,et al.  FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[13]  Tianqi Wang,et al.  LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[14]  Martin C. Herbordt,et al.  A Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Work and Weight Load Balancing , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[17]  Tianqi Wang,et al.  FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters , 2020, IEEE Transactions on Computers.

[18]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[19]  Eunhyeok Park,et al.  Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[20]  Farinaz Koushanfar,et al.  ReBNet: Residual Binarized Neural Network , 2017, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[21]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[22]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[23]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[24]  Kenneth O'Brien,et al.  FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks , 2018 .

[25]  Antonino Tumeo,et al.  AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing , 2019, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.