Connection-based Processing-In-Memory Engine Design Based on Resistive Crossbars

Deep neural networks have successfully been applied to various fields. The efficient deployment of neural network models emerges as a new challenge. Processing-in-memory (PIM) engines that carry out computation within memory structures are widely studied for improving computation efficiency and data communication speed. In particular, resistive memory crossbars can naturally realize the dot-product operations and show great potential in PIM design. The common practice of a current-based design is to map a matrix to a crossbar, apply the input data from one side of the crossbar, and extract the accumulated currents as the computation results at the orthogonal direction. In this study, we propose a novel PIM design concept that is based on the crossbar connections. Our analysis on star-mesh network transformation reveals that in a crossbar storing both input data and weight matrix, the dot-product result is embedded within the network connection. Our proposed connection-based PIM design leverages this feature and discovers the latent dot-products directly from the connection information. Moreover, in the connection-based PIM design, the output current range of resistive crossbars can easily be adjusted, leading to more linear conversion to voltage values, and the output circuitry can be shared by multiple resistive crossbars. The simulation results show that our design can achieve on average 46.23% and 33.11% reductions in area and energy consumption, with a merely 3.85% latency overhead compared with current-based designs.

[1]  Song Han,et al.  A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[2]  Xiaochen Peng,et al.  NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Qing Wu,et al.  Hardware realization of BSB recall function using memristor crossbar arrays , 2012, DAC Design Automation Conference 2012.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Shaahin Angizi,et al.  GraphiDe: A Graph Processing Accelerator leveraging In-DRAM-Computing , 2019, ACM Great Lakes Symposium on VLSI.

[6]  Y. Wu,et al.  Variation-aware, reliability-emphasized design and optimization of RRAM using SPICE model , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Hao Jiang,et al.  A spiking neuromorphic design with resistive crossbar , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[9]  Meng-Fan Chang,et al.  Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[10]  Uri C. Weiser,et al.  MAGIC—Memristor-Aided Logic , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[11]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Shuang Wu,et al.  Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[13]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[14]  Rhjm Ralph Otten,et al.  Planarization by transformation , 1973 .

[15]  Shimeng Yu,et al.  Compact Modeling of RRAM Devices and Its Applications in 1T1R and 1S1R Array Design , 2015, IEEE Transactions on Electron Devices.

[16]  Xiaoming Chen,et al.  Mixed Size Crossbar based RRAM CNN Accelerator with Overlapped Mapping Method , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[18]  Shimeng Yu,et al.  Design Tradeoffs of Vertical RRAM-Based 3-D Cross-Point Array , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Uri C. Weiser,et al.  Memristor-Based Material Implication (IMPLY) Logic: Design Principles and Methodologies , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[20]  Yiran Chen,et al.  RRAM-based Spiking Nonvolatile Computing-In-Memory Processing Engine with Precision-Configurable In Situ Nonlinear Activation , 2019, 2019 Symposium on VLSI Technology.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  S. Bedrosian Converse of the Star-Mesh Transformation , 1961 .

[23]  Emmanuelle J. Merced-Grafals,et al.  Repeatable, accurate, and high speed multi-level programming of memristor 1T1R arrays for power efficient analog computing applications , 2016, Nanotechnology.