A Convolutional Accelerator for Neural Networks With Binary Weights

Parallel processors and GP-GPUs have been routinely used in the past to perform the computations of convolutional neural networks (CNNs). However, their large power consumption has pushed researchers towards application-specific integrated circuits and on-chip accelerators implement neural networks. Nevertheless, within the Internet of Things (IoT) scenario, even these accelerators fail to meet the power and latency constraints. To address this issue, binary-weight networks were introduced, where weights are constrained to −1 and 1. Therefore, these networks facilitate hardware implementation of neural networks by replacing multiply-and-accumulate units with simple accumulators, as well as reducing the weight storage. In this paper, we introduce a convolutional accelerator for binary-weight neural networks. The proposed architecture only consumes 128 mW at a frequency of 200 MHz and occupies 1.2 mm2 when synthesized in TSMC 65 nm CMOS technology. Moreover, it achieves a high area-efficiency of 176 Gops/MGC and performance efficiency of 89%, outperforming the state-of-the-art architecture for binary-weight networks by 1.8× and 3.2×, respectively.

[1]  Naoya Onizawa,et al.  VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Warren J. Gross,et al.  An Architecture to Accelerate Convolution in Deep Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Paramvir Bahl,et al.  GLIMPSE: Continuous, Real-Time Object Recognition on Mobile Devices , 2016, GetMobile Mob. Comput. Commun..

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[7]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Arash Ardakani,et al.  Stochastic Computing Can Improve Upon Digital Spiking Neural Networks , 2016, 2016 IEEE International Workshop on Signal Processing Systems (SiPS).

[10]  Bernard Brezzo,et al.  TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[12]  Arash Ardakani,et al.  Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks , 2016, ICLR.

[13]  Dajiang Zhou,et al.  Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.