A Convolutional Result Sharing Approach for Binarized Neural Network Inference

The binary-weight-binary-input binarized neural network (BNN) allows a much more efficient way to implement convolutional neural networks (CNNs) on mobile platforms. During inference, the multiply-accumulate operations in BNNs can be reduced to XNOR-popcount operations. Thus, the XNOR-popcount operations dominate most of the computation in BNNs. To reduce the number of required operations in convolution layers of BNNs, we decompose 3-D filters into 2-D filters and exploit the repeated filters, inverse filters, and similar filters to share results. By sharing the results, the number of operations in convolution layers of BNNs can be reduced effectively. Experimental results show that the number of operations can be reduced by about 60% for CIFAR-10 on BNNs while keeping the accuracy loss within 1% of originally trained network.

[1]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[2]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[3]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[4]  Jie-Hong Roland Jiang,et al.  Logic Synthesis of Binarized Neural Networks for Efficient Circuit Implementation , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[6]  Dhiraj K. Pradhan,et al.  Design Automation and Test in Europe (DATE) , 2014 .

[7]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[8]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[9]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[10]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[11]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[14]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[15]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[16]  Lee-Sup Kim,et al.  A kernel decomposition architecture for binary-weight Convolutional Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).