In-Memory Computing: The Next-Generation AI Computing Paradigm

To overcome the memory bottleneck of von-Neuman architecture, various memory-centric computing techniques are emerging to reduce the latency and energy consumption caused by data communication. The great success of artificial intelligence (AI) algorithms, which involve a large number of computations and data movements, has motivated and accelerated the recent researches of in-memory computing (IMC) techniques to significantly reduce or even diminish the accesses of off-chip data, where memory is not only storing data but can also directly output computation results. For example, the multiply-and-accumulate (MAC) operations in deep learning algorithms can be realized by accessing the memory using the input activations. This paper will investigate the recent trends of IMC from techniques (SRAM, flash, RRAM and other types of non-volatile memory) to architecture and to applications, which will serve as a guide to the future advances on computing in-memory (CIM).

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Meng-Fan Chang,et al.  A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors , 2020, IEEE Journal of Solid-State Circuits.

[3]  Mau-Chung Frank Chang,et al.  An Analog Neural Network Computing Engine Using CMOS-Compatible Charge-Trap-Transistor (CTT) , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Jun Yang,et al.  24.4 Sandwich-RAM: An Energy-Efficient In-Memory BWN Architecture with Pulse-Width Modulation , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[5]  David Blaauw,et al.  A 28-nm Compute SRAM With Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing , 2020, IEEE Journal of Solid-State Circuits.

[6]  Abu Sebastian,et al.  Accumulation-Based Computing Using Phase-Change Memories With FET Access Devices , 2015, IEEE Electron Device Letters.

[7]  Jae-sun Seo,et al.  XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks , 2018, 2018 IEEE Symposium on VLSI Technology.

[8]  J.F. Kang,et al.  Flash-based Computing in-Memory Scheme for IOT , 2019, 2019 IEEE 13th International Conference on ASIC (ASICON).

[9]  David Atienza,et al.  A Fast, Reliable and Wide-Voltage-Range In-Memory Computing Architecture , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[10]  SungWon Chung,et al.  Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[11]  Sujan Kumar Gonugondla,et al.  A Variation-Tolerant In-Memory Machine Learning Classifier via On-Chip Training , 2018, IEEE Journal of Solid-State Circuits.

[12]  Youguang Zhang,et al.  A Multilevel Cell STT-MRAM-Based Computing In-Memory Accelerator for Binary Convolutional Neural Network , 2018, IEEE Transactions on Magnetics.

[13]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[14]  Qian Chen,et al.  A Logic Compatible 4T Dual Embedded DRAM Array for In-Memory Computation of Deep Neural Networks , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[15]  A. Sebastian,et al.  8-bit Precision In-Memory Multiplication with Projected Phase-Change Memory , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[16]  Anand Raghunathan,et al.  Computing in Memory With Spin-Transfer Torque Magnetic RAM , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  F. Merrikh Bayat,et al.  Model-based high-precision tuning of NOR flash memory cells for analog computing applications , 2016, 2016 74th Annual Device Research Conference (DRC).

[18]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[19]  Sujan Kumar Gonugondla,et al.  A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Meng-Fan Chang,et al.  24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[22]  Evangelos Eleftheriou,et al.  Computational phase-change memory: beyond von Neumann computing , 2019, Journal of Physics D: Applied Physics.

[23]  Abu Sebastian,et al.  Tutorial: Brain-inspired computing using phase-change memory devices , 2018, Journal of Applied Physics.

[24]  Sujan Kumar Gonugondla,et al.  An MRAM-Based Deep In-Memory Architecture for Deep Neural Networks , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[25]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[26]  Pritish Narayanan,et al.  Neuromorphic computing using non-volatile memory , 2017 .

[27]  Anantha P. Chandrakasan,et al.  CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[28]  Yuan Xie,et al.  DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Kaushik Roy,et al.  SPINDLE: SPINtronic Deep Learning Engine for large-scale neuromorphic computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[30]  Cheng-Xin Xue,et al.  Nonvolatile Circuits-Devices Interaction for Memory, Logic and Artificial Intelligence , 2018, 2018 IEEE Symposium on VLSI Technology.

[31]  Yu Wang,et al.  Computation-oriented fault-tolerance schemes for RRAM computing systems , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[32]  Ryutaro Yasuhara,et al.  A 4M Synapses integrated Analog ReRAM based 66.5 TOPS/W Neural-Network Processor with Cell Current Controlled Writing and Flexible Network Architecture , 2018, 2018 IEEE Symposium on VLSI Technology.

[33]  Hossein Valavi,et al.  A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute , 2019, IEEE Journal of Solid-State Circuits.

[34]  Jeffrey S. Vetter,et al.  Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing , 2015, Computing in Science & Engineering.

[35]  Kaushik Roy,et al.  Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[36]  Chih-Yuan Lu,et al.  Optimal Design Methods to Transform 3D NAND Flash into a High-Density, High-Bandwidth and Low-Power Nonvolatile Computing in Memory (nvCIM) Accelerator for Deep-Learning Neural Networks (DNN) , 2019, 2019 IEEE International Electron Devices Meeting (IEDM).

[37]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).