论文信息 - MSB-First Distributed Arithmetic Circuit for Convolution Neural Network Computation

MSB-First Distributed Arithmetic Circuit for Convolution Neural Network Computation

This paper presents a novel circuit implementation for convolution filters and rectified linear activation function used in deep neural networks. By conducting computation in an MSB-first bit-serial manner, it can predict earlier if the outcomes of filter computations will be negative and subsequently terminate the remaining computations to save power. It also stores pre-computed partial products into look-up tables to eliminate the need for multiplier circuits, leading to hardware efficient implementation. The proposed circuit is implemented on an FPGA and it shows significant power and performance improvements compared to a conventional design implemented on the same FPGA.

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Rajesh K. Gupta,et al. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[3] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[5] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[8] Kiyoung Choi,et al. ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator , 2018, ICS.

[9] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11] Hoi-Jun Yoo,et al. UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision , 2019, IEEE Journal of Solid-State Circuits.

[12] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13] S.A. White,et al. Applications of distributed arithmetic to digital signal processing: a tutorial review , 1989, IEEE ASSP Magazine.

[14] Natalie D. Enright Jerger,et al. Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets , 2015, ArXiv.