Data-centric computation mode for convolution in deep neural networks

Deep Convolutional Neural Network (CNN) based methods have shown outstanding performance in a wide range of applications. Nowadays neural networks become deeper, leading to demand of substantial computation and memory resources. Customized hardware is one option which maintains high performance in lower energy consume than general CPUs or GPUs. While hardware designing, we need to address the problem of massive data transmission, and ensure high throughput at the same time. Actually, substantial data transfer consumes more energy than computation. The larger scales of neural networks become, the harder to solve this problem. In this paper, we focus on convolution operation, which occupies nearly 90% computation and runtime in deep CNN. We propose a data-centric computation mode for convolution, which declines the total requirements of data transfer during convolution processing period efficiently, and utilizes data locality to achieve high throughput. Different from previous methods, which adopt efficient on-chip memory hierarchy or focus on partial results' movements, our proposed method concentrates on operands themselves in convolution, minimizing data transfer right from the start. Obviously, it can be combined with others to achieve higher energy efficient. Furthermore, we also simulate and analyse the hardware overhead of our data-centric convolution, corroborating its potentiality of performing high throughput in low energy consumption.

[1]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]  Srihari Cadambi,et al.  A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[4]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[6]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Jun-Seok Park,et al.  14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[8]  Xi Liu,et al.  Finite-Time Attitude Tracking Control for Spacecraft Using Terminal Sliding Mode and Chebyshev Neural Network , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[10]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[11]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[13]  Lawrence D. Jackel,et al.  Application of the ANNA neural network chip to high-speed character recognition , 1992, IEEE Trans. Neural Networks.

[14]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Dhinakaran Pandiyan Data Movement Energy Characterization of Emerging Smartphone Workloads for Mobile Platforms , 2014 .

[17]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[19]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[24]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).