论文信息 - A Convolutional Accelerator for Neural Networks With Binary Weights

A Convolutional Accelerator for Neural Networks With Binary Weights

Parallel processors and GP-GPUs have been routinely used in the past to perform the computations of convolutional neural networks (CNNs). However, their large power consumption has pushed researchers towards application-specific integrated circuits and on-chip accelerators implement neural networks. Nevertheless, within the Internet of Things (IoT) scenario, even these accelerators fail to meet the power and latency constraints. To address this issue, binary-weight networks were introduced, where weights are constrained to −1 and 1. Therefore, these networks facilitate hardware implementation of neural networks by replacing multiply-and-accumulate units with simple accumulators, as well as reducing the weight storage. In this paper, we introduce a convolutional accelerator for binary-weight neural networks. The proposed architecture only consumes 128 mW at a frequency of 200 MHz and occupies 1.2 mm2 when synthesized in TSMC 65 nm CMOS technology. Moreover, it achieves a high area-efficiency of 176 Gops/MGC and performance efficiency of 89%, outperforming the state-of-the-art architecture for binary-weight networks by 1.8× and 3.2×, respectively.

[1] Naoya Onizawa,et al. VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2] Warren J. Gross,et al. An Architecture to Accelerate Convolution in Deep Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4] Paramvir Bahl,et al. GLIMPSE: Continuous, Real-Time Object Recognition on Mobile Devices , 2016, GetMobile Mob. Comput. Commun..

[5] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[7] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[9] Arash Ardakani,et al. Stochastic Computing Can Improve Upon Digital Spiking Neural Networks , 2016, 2016 IEEE International Workshop on Signal Processing Systems (SiPS).

[10] Bernard Brezzo,et al. TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11] Luca Benini,et al. YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[12] Arash Ardakani,et al. Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks , 2016, ICLR.

[13] Dajiang Zhou,et al. Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.