论文信息 - Optimizing Weight Mapping and Data Flow for Convolutional Neural Networks on RRAM Based Processing-In-Memory Architecture

Optimizing Weight Mapping and Data Flow for Convolutional Neural Networks on RRAM Based Processing-In-Memory Architecture

Resistive random access memory (RRAM) based array architecture has been proposed for on-chip acceleration of convolutional neural networks (CNNs), where the array could be configured for dot-product computation in a parallel fashion by summing up the column currents. Prior processing-in-memory (PIM) designs unroll each 3D kernel of the convolutional layers into a vertical column of a large weight matrix, where the input data will be accessed multiple times. As a result, significant latency and energy are consumed in interconnect and buffer. In this paper, in order to maximize both weight and input data reuse for RRAM based PIM architecture, we propose a novel weight mapping method and the corresponding data flow which divides the kernels and assign the input data into different processing-elements (PEs) according to their spatial locations. The proposed design achieves ∼65% save in latency and energy for interconnect and buffer, and yields overall 2.1× speed up and ∼17% improvement in the energy efficiency in terms of TOPS/W for VGG-16 CNN, compared with the prior design based on the conventional mapping method.

[1] Shimeng Yu,et al. Neuro-Inspired Computing With Emerging Nonvolatile Memorys , 2018, Proceedings of the IEEE.

[2] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5] Tayfun Gokmen,et al. Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices , 2017, Front. Neurosci..

[6] Shimeng Yu,et al. Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[7] Xiaochen Peng,et al. NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8] Yiran Chen,et al. Memristor Crossbar-Based Neuromorphic Computing System: A Case Study , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[9] Xiaochen Peng,et al. XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).