IM3A: Boosting Deep Neural Network Efficiency via In-Memory Addressing-Assisted Acceleration

Most existing RRAM-based designs require expensive analog-to-digital converters (ADCs) digital-to-analog converters (DACs) and excessively occupied crossbars to achieve efficient acceleration. To reduce the overhead of DACs, the existing solution is to split the input into a bit sequence, but the MAC operation that can be completed by one cycle is forced to multiple cycles to the energy-efficiency decrease. For ADCs, it generally partitions the weight into multiple cells, resulting in an excessive number of crossbars or frequent writes on account of insufficient number. To solve this problem, we propose IM3A, an In-Memory Addressing-Assisted Acceleration scheme IM3A decompose MAC operations into multiplication and accumulation, which are implemented separately through the content-addressable and multiply-accumulated capabilities of the crossbar. The energy-efficiency is improved by the CAM crossbar supporting the parallel search of very large numbers of data bits, and the RRAM crossbar selectively enabling the rows to be read based on the hit result of the CAM search. Therefore, only the possibility of operands involved in MAC is deployed on the crossbar. Experimental results show that IM3A applied on various networks achieves system energy-efficiency improvement by 1.7x ∼ 15.9x over two state-of-the-art crossbar accelerators: ISAAC and PIM-Prune.

[1]  Yusuf Leblebici,et al.  A 3.1 mW 8b 1.2 GS/s Single-Channel Asynchronous SAR ADC With Alternate Comparators for Enhanced Speed in 32 nm Digital SOI CMOS , 2013, IEEE Journal of Solid-State Circuits.

[2]  Wei Wang,et al.  Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks , 2020, ICLR.

[3]  Eric Pop,et al.  Ternary content-addressable memory with MoS2 transistors for massively parallel data search , 2019, Nature Electronics.

[4]  Vijaykrishnan Narayanan,et al.  GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar Architectures , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[5]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Yanzhi Wang,et al.  PIM-Prune: Fine-Grain DCNN Pruning for Crossbar-Based Process-In-Memory Architecture , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[7]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[8]  Hoi-Jun Yoo,et al.  UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision , 2019, IEEE Journal of Solid-State Circuits.

[9]  Yu Wang,et al.  Low Bit-Width Convolutional Neural Network on RRAM , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Bin Gao,et al.  Fully hardware-implemented memristor convolutional neural network , 2020, Nature.

[12]  Wei Tang,et al.  CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm , 2019, MICRO.

[13]  Chia-Lin Yang,et al.  Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[14]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[16]  Li Jiang,et al.  DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).