An Overview of In-memory Processing with Emerging Non-volatile Memory for Data-intensive Applications

The conventional von Neumann architecture has been revealed as a major performance and energy bottleneck for rising data-intensive applications. The decade-old idea of leveraging in-memory processing to eliminate substantial data movements has returned and led extensive research activities. The effectiveness of in-memory processing heavily relies on memory scalability, which cannot be satisfied by traditional memory technologies. Emerging non-volatile memories (eNVMs) that pose appealing qualities such as excellent scaling and low energy consumption, on the other hand, have been heavily investigated and explored for realizing in-memory processing architecture. In this paper, we summarize the recent research progress in eNVM-based in-memory processing from various aspects, including the adopted memory technologies, locations of the in-memory processing in the system, supported arithmetics, as well as applied applications.

[1]  Thomas P. Parnell,et al.  Temporal correlation detection using computational phase-change memory , 2017, Nature Communications.

[2]  Yuhua Cheng,et al.  Solenoid Model for the Magnetic Flux Leakage Testing Based on the Molecular Current , 2018, IEEE Transactions on Magnetics.

[3]  Kaushik Roy,et al.  In-situ, In-Memory Stateful Vector Logic Operations based on Voltage Controlled Magnetic Anisotropy , 2018, Scientific Reports.

[4]  Onur Mutlu,et al.  The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Engin Ipek,et al.  Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017 .

[6]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[7]  Abu Sebastian,et al.  Tutorial: Brain-inspired computing using phase-change memory devices , 2018, Journal of Applied Physics.

[8]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[9]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[10]  Jun Yang,et al.  DrAcc: a DRAM based Accelerator for Accurate CNN Inference , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[11]  Y.C. Chen,et al.  Write Strategies for 2 and 4-bit Multi-Level Phase-Change Memory , 2007, 2007 IEEE International Electron Devices Meeting.

[12]  Yuan Xie,et al.  DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Uri C. Weiser,et al.  MAGIC—Memristor-Aided Logic , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[14]  C. Wright,et al.  Arithmetic and Biologically-Inspired Computing Using Phase-Change Materials , 2011, Advanced materials.

[15]  Kaushik Roy,et al.  X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[16]  Yiran Chen,et al.  ReRAM-based accelerator for deep learning , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Sujan Kumar Gonugondla,et al.  A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.

[18]  Yu Hu,et al.  Power-Utility-Driven Write Management for MLC PCM , 2017, ACM J. Emerg. Technol. Comput. Syst..

[19]  Shaahin Angizi,et al.  HielM: Highly flexible in-memory computing using STT MRAM , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[20]  Yiran Chen,et al.  GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[21]  Dong Li,et al.  A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.

[22]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[23]  Anand Raghunathan,et al.  Computing in Memory With Spin-Transfer Torque Magnetic RAM , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[24]  A. Sebastian,et al.  Compressed sensing recovery using computational memory , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[25]  Haralampos Pozidis,et al.  Recent Progress in Phase-Change Memory Technology , 2016, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[26]  Kaushik Roy,et al.  Future cache design using STT MRAMs for improved energy efficiency: Devices, circuits and architecture , 2012, DAC Design Automation Conference 2012.

[27]  Pritish Narayanan,et al.  Experimental Demonstration and Tolerancing of a Large-Scale Neural Network (165 000 Synapses) Using Phase-Change Memory as the Synaptic Weight Element , 2014, IEEE Transactions on Electron Devices.

[28]  Abu Sebastian,et al.  Accumulation-Based Computing Using Phase-Change Memories With FET Access Devices , 2015, IEEE Electron Device Letters.

[29]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[30]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[31]  Mingyu Gao,et al.  HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[32]  Zhaohao Wang,et al.  In-Memory Processing Paradigm for Bitwise Logic Operations in STT–MRAM , 2017, IEEE Transactions on Magnetics.

[33]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[34]  Chenchen Liu,et al.  Build reliable and efficient neuromorphic design with memristor technology , 2019, ASP-DAC.

[35]  Huanrui Yang,et al.  AtomLayer: A Universal ReRAM-Based CNN Accelerator with Atomic Layer Computation , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[36]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[37]  C. Wright,et al.  Beyond von‐Neumann Computing with Nanoscale Phase‐Change Memory Devices , 2013 .

[38]  Yiran Chen,et al.  RED: A ReRAM-based Deconvolution Accelerator , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[39]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[40]  Chaitali Chakrabarti,et al.  A Parallel RRAM Synaptic Array Architecture for Energy-Efficient Recurrent Neural Networks , 2018, 2018 IEEE International Workshop on Signal Processing Systems (SiPS).

[41]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[42]  Yiran Chen,et al.  Understanding the trade-offs of device, circuit and application in ReRAM-based neuromorphic computing systems , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[43]  Sparsh Mittal,et al.  A survey of architectural techniques for improving cache power efficiency , 2014, Sustain. Comput. Informatics Syst..

[44]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[45]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[46]  Engin Ipek,et al.  Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing , 2010, ISCA.

[47]  Youguang Zhang,et al.  A Multilevel Cell STT-MRAM-Based Computing In-Memory Accelerator for Binary Convolutional Neural Network , 2018, IEEE Transactions on Magnetics.

[48]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[49]  Heiner Giefers,et al.  Mixed-precision in-memory computing , 2017, Nature Electronics.

[50]  Jung Ho Ahn,et al.  NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[51]  Yu Wang,et al.  TIME: A training-in-memory architecture for memristor-based deep neural networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[52]  Mehdi Baradaran Tahoori,et al.  A cross-layer adaptive approach for performance and power optimization in STT-MRAM , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[53]  D. Ielmini,et al.  Logic Computation in Phase Change Materials by Threshold and Memory Switching , 2013, Advanced materials.

[54]  Shimeng Yu,et al.  Emerging Memory Technologies: Recent Trends and Prospects , 2016, IEEE Solid-State Circuits Magazine.

[55]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[56]  Y. Leblebici,et al.  Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power) , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[57]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.