An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference

While deep convolutional neural networks (CNNs) have emerged as the driving force of a wide range of domains, their computationally and memory intensive natures hinder the further deployment in mobile and embedded applications. Recently, CNNs with low-precision parameters have attracted much research attention. Among them, multiplier-free binary- and ternary-weight CNNs are reported to be of comparable recognition accuracy with full-precision networks, and have been employed to improve the hardware efficiency. However, even with the weights constrained to binary and ternary values, large-scale CNNs still require billions of operations in a single forward propagation pass.In this paper, we introduce a novel approach to maximally eliminate redundancy in binary- and ternary-weight CNN inference, improving both the performance and energy efficiency. The initial kernels are transformed into much fewer and sparser ones, and the output feature maps are rebuilt from the immediate results. Overall, the number of total operations in convolution is reduced. To find an efficient transformation solution for each already trained network, we propose a searching algorithm, which iteratively matches and eliminates the overlap in a set of kernels. We design a specific hardware architecture to optimize the implementation of kernel transformation. Specialized dataflow and scheduling method are proposed. Tested on SVHN, AlexNet, and VGG-16, our architecture removes 43.4%–79.9% operations, and speeds up the inference by 1.48–3.01 times.

[1]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[2]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[7]  Miodrag Potkonjak,et al.  Efficient Substitution of Multiple Constant Multiplications by Shifts and Additions Using Iterative Pairwise Matching , 1994, 31st Design Automation Conference.

[8]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[9]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[10]  Luca Benini,et al.  YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Lee-Sup Kim,et al.  A kernel decomposition architecture for binary-weight Convolutional Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Hang Su,et al.  Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization , 2017, BMVC.