TB-DNN: A Thin Binarized Deep Neural Network with High Accuracy

Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) applications. However, due to the huge demand for computing and storage resources as well as the high power consumption, deploying DNN models on embedded devices is full of challenges. Recent works have shown that DNN models can be compressed by removing their inner redundancy without obviously performance decay. In this work, we propose a two stage pipeline way to compress the ResNet-14 model and test it on CIFAR-10 and SVHN dataset respectively. Firstly, we use a filter level pruning method to remove the less important filters with different compression rate, and a considerable computation costs are reduced. Secondly, we binarize the pruned model to further reduce the model size and computational complexity. The training results show that we achieve 87.7% accuracy with only 1.86Mb model size on CIFAR-10 and 96.2% accuracy with 1.34Mb on SVHN. Compared to the original model, we have 57% to 68% FLOPs reduction and 45.6× to 63.1× model size compression at the cost of roughly 4% accuracy drop. Finally, we implement the thin binarized ResNet-14 model on the Xilinx KC705 board with a shared, flexible accumulator, which can save 46.8% logic resources. And the entire network parameters are store into on-chip RAM, which can greatly reduce the energy consumption and memory overhead caused by off-chip accesses. The experimental results show that on CIFAR-10 dataset, we achieve an overall performance of 1200 FPS, energy efficiency of 571 FPS/W, which denote 2.3× and 3.6× improvements over the most recent work.

[1]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[2]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[3]  Binh-Son Hua,et al.  Pointwise Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Peng Guo,et al.  A High-Efficiency FPGA-Based Accelerator for Binarized Neural Network , 2019, J. Circuits Syst. Comput..

[5]  Hong-sheng Yin,et al.  A novel improved deep convolutional neural network model for medical image fusion , 2018, Cluster Computing.

[6]  Hao Yu,et al.  A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only) , 2017, FPGA.

[7]  Jeevan Kanesan,et al.  PCANet-Based Convolutional Neural Network Architecture for a Vehicle Model Recognition System , 2019, IEEE Transactions on Intelligent Transportation Systems.

[8]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[9]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[10]  Li Yang,et al.  A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference , 2018, ISLPED.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Jishen Zhao,et al.  Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA , 2018, FPGA.

[13]  Wenxian Yu,et al.  A coupled convolutional neural network for small and densely clustered ship detection in SAR images , 2018, Science China Information Sciences.

[14]  Ying Zhang,et al.  Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.

[15]  Tsutomu Sasao,et al.  A memory-based realization of a binarized deep convolutional neural network , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[16]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).