Hybrid Convolution Architecture for Energy-Efficient Deep Neural Network Processing

This paper presents a convolution process and its hardware architecture for energy-efficient deep neural network (DNN) processing. A DNN in general consists of a number of convolutional layers, and the number of input features involved in the convolution of a shallow layer is larger than that of kernels. As the layer deepens, however, the number of input features decreases, while that of kernels increases. The previous convolution architectures developed for enhancing energy efficiency have tried to reduce the memory accesses by increasing the reuse of the data once accessed from the memory. However, redundant memory accesses are still required as the change in the numbers of data has not been considered. We propose a hybrid convolution process that selects either a kernel-stay or feature-stay process by taking into account the numbers of data, and a forwarding technique to further reduce the memory accesses needed to store and load partial sums. The proposed convolution process is effective in maximizing data reuse, leading to an energy-efficient hybrid convolution architecture. Compared to the state-of-the- art architectures, the proposed architecture enhances the energy efficiency by up to 2.38 times in a 65nm CMOS process.

[1]  Marian Verhelst,et al.  An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS , 2017, IEEE Journal of Solid-State Circuits.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[4]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Hoi-Jun Yoo,et al.  A 1.93 TOPS/W Scalable Deep Learning/Inference Processor with Tetra-parallel MIMD Architecture for Big Data Applications , 2015 .

[6]  Warren J. Gross,et al.  An Architecture to Accelerate Convolution in Deep Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7]  Jihyuck Jo,et al.  Energy-Efficient Convolution Architecture Based on Rescheduled Dataflow , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[8]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  John G. McWhirter,et al.  Systolic Array Processors , 1989 .

[11]  Youchang Kim,et al.  14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[12]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[14]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[15]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Dong Han,et al.  Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[18]  Jihyuck Jo,et al.  DSIP: A Scalable Inference Accelerator for Convolutional Neural Networks , 2018, IEEE Journal of Solid-State Circuits.

[19]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[20]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[21]  Wei He,et al.  Adaptive Neural Network Control of an Uncertain Robot With Full-State Constraints , 2016, IEEE Transactions on Cybernetics.

[22]  Marian Verhelst,et al.  5 ENVISION : A 0 . 26-to-10 TOPS / W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28 nm FDSOI , 2017 .