In‐Memory Vector‐Matrix Multiplication in Monolithic Complementary Metal–Oxide–Semiconductor‐Memristor Integrated Circuits: Design Choices, Challenges, and Perspectives

The low communication bandwidth between memory and processing units in conventional von Neumann machines does not support the requirements of emerging applications that rely extensively on large sets of data. More recent computing paradigms, such as high parallelization and near‐memory computing, help alleviate the data communication bottleneck to some extent, but paradigm‐shifting concepts are required. In‐memory computing has emerged as a prime candidate to eliminate this bottleneck by colocating memory and processing. In this context, resistive switching (RS) memory devices is a key promising choice, due to their unique intrinsic device‐level properties, enabling both storing and computing with a small, massively‐parallel footprint at low power. Theoretically, this directly translates to a major boost in energy efficiency and computational throughput, but various practical challenges remain. A qualitative and quantitative analysis of several key existing challenges in implementing high‐capacity, high‐volume RS memories for accelerating the most computationally demanding computation in machine learning (ML) inference, that of vector‐matrix multiplication (VMM), is presented. The monolithic integration of RS memories with complementary metal–oxide–semiconductor (CMOS) integrated circuits is presented as the core underlying technology. The key existing design choices in terms of device‐level physical implementation, circuit‐level design, and system‐level considerations is reviewed and an outlook for future directions is provided.

[1]  Manuel Le Gallo,et al.  Memory devices and applications for in-memory computing , 2020, Nature Nanotechnology.

[2]  Taimur Rabuske,et al.  Charge-Sharing SAR ADCs for Low-Voltage Low-Power Applications , 2016 .

[3]  Seung Hwan Lee,et al.  Reservoir computing using dynamic memristors for temporal information processing , 2017, Nature Communications.

[4]  A. Bhardwaj,et al.  In situ click chemistry generation of cyclooxygenase-2 inhibitors , 2017, Nature Communications.

[5]  Nikolas Ioannou,et al.  Deep learning acceleration based on in-memory computing , 2019, IBM J. Res. Dev..

[6]  Andrew S. Cassidy,et al.  TrueNorth: Accelerating From Zero to 64 Million Neurons in 10 Years , 2019, Computer.

[7]  Catherine E. Graves,et al.  Memristor‐Based Analog Computation and Neural Network Classification with a Dot Product Engine , 2018, Advanced materials.

[8]  Ali Khiat,et al.  Challenges hindering memristive neuromorphic hardware from going mainstream , 2018, Nature Communications.

[9]  Ligang Gao,et al.  High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm , 2011, Nanotechnology.

[10]  Teresa J. Feo,et al.  Structural absorption by barbule microstructures of super black bird of paradise feathers , 2018, Nature Communications.

[11]  Leibo Liu,et al.  A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications , 2018, IEEE Journal of Solid-State Circuits.

[12]  Daniele Ielmini,et al.  Solving matrix equations in one step with cross-point resistive arrays , 2019, Proceedings of the National Academy of Sciences.

[13]  Chung Lam,et al.  Brain-like associative learning using a nanoscale non-volatile phase change synaptic device array , 2014, Front. Neurosci..

[14]  J. Yang,et al.  Memristive crossbar arrays for brain-inspired computing , 2019, Nature Materials.

[15]  Hyunsang Hwang,et al.  Perspective: A review on memristive hardware for neuromorphic computation , 2018, Journal of Applied Physics.

[16]  Gökmen Tayfun,et al.  Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[17]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[18]  Bing Wu,et al.  Cross-point Resistive Memory , 2019, ACM Trans. Design Autom. Electr. Syst..

[19]  Hoi-Jun Yoo,et al.  UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision , 2019, IEEE Journal of Solid-State Circuits.

[20]  SankaralingamKarthikeyan,et al.  Dark silicon and the end of multicore scaling , 2011 .

[21]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[22]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[23]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[24]  Heiner Giefers,et al.  Mixed-precision in-memory computing , 2017, Nature Electronics.

[25]  Xuefei Ning,et al.  Fault-Tolerant Training Enabled by On-Line Fault Detection for RRAM-Based Neural Computing Systems , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[27]  Omid Kavehei,et al.  An associative capacitive network based on nanoscale complementary resistive switches for memory-intensive computing. , 2013, Nanoscale.

[28]  Steven J. Plimpton,et al.  Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator , 2017, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[29]  Shimeng Yu,et al.  On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices , 2015, IEEE Transactions on Nanotechnology.

[30]  Ali Khiat,et al.  Unsupervised learning in probabilistic neural networks with multi-state metal-oxide memristive synapses , 2016, Nature Communications.

[31]  Jan M. Rabaey,et al.  High-Dimensional Computing as a Nanoscalable Paradigm , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[32]  Seung Hwan Lee,et al.  Temporal data classification and forecasting using a memristor-based reservoir computing system , 2019, Nature Electronics.

[33]  Pramod K. Varshney,et al.  A Memristor-Based Optimization Framework for Artificial Intelligence Applications , 2018, IEEE Circuits and Systems Magazine.

[34]  Catherine Dubourdieu,et al.  Interfacial versus filamentary resistive switching in TiO2 and HfO2 devices , 2016 .

[35]  Peng Lin,et al.  A provable key destruction scheme based on memristive crossbar arrays , 2018, Nature Electronics.

[36]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[37]  Meng-Fan Chang,et al.  CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors , 2019, Nature Electronics.

[38]  Rajeev Balasubramonian,et al.  Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration , 2018, IEEE Micro.

[39]  Weiwei Xia,et al.  Memristor Crossbars with 4.5 Terabits-per-Inch-Square Density and Two Nanometer Dimension , 2018, ArXiv.

[40]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[41]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[42]  Dmitri B. Strukov,et al.  Energy-Efficient Time-Domain Vector-by-Matrix Multiplier for Neurocomputing and Beyond , 2017, IEEE Transactions on Circuits and Systems II: Express Briefs.

[43]  Ligang Gao,et al.  Physical Unclonable Function Exploiting Sneak Paths in Resistive Cross-point Array , 2016, IEEE Transactions on Electron Devices.

[44]  H.-S. Philip Wong,et al.  In-memory computing with resistive switching devices , 2018, Nature Electronics.

[45]  S. Ambrogio,et al.  Statistical Fluctuations in HfOx Resistive-Switching Memory: Part II—Random Telegraph Noise , 2014, IEEE Transactions on Electron Devices.

[46]  Robert H. Walden,et al.  Analog-to-digital converter survey and analysis , 1999, IEEE J. Sel. Areas Commun..

[47]  M. R. Mahmoodi,et al.  Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization , 2019, Nature Communications.

[48]  Farnood Merrikh-Bayat,et al.  Self-Adaptive Spike-Time-Dependent Plasticity of Metal-Oxide Memristors , 2015, Scientific Reports.

[49]  Chih-Yang Lin,et al.  Complementary Metal‐Oxide Semiconductor and Memristive Hardware for Neuromorphic Computing , 2020, Adv. Intell. Syst..

[50]  Mark Barnell,et al.  Three-dimensional memristor circuits as complex neural networks , 2020, Nature Electronics.

[51]  B. Ehlmann,et al.  Tracing the fate of carbon and the atmospheric evolution of Mars , 2015, Nature Communications.

[52]  P. Narayanan,et al.  Access devices for 3D crosspoint memorya) , 2014 .

[53]  Bing Chen,et al.  A general memristor-based partial differential equation solver , 2018, Nature Electronics.

[54]  Qiangfei Xia,et al.  Reservoir Computing Using Diffusive Memristors , 2019, Adv. Intell. Syst..

[55]  Frank Ohnhäuser Analog-Digital Converters for Industrial Applications Including an Introduction to Digital-Analog Converters , 2015 .

[56]  Jiaming Zhang,et al.  Analogue signal and image processing with large memristor crossbars , 2017, Nature Electronics.

[57]  Shahar Kvatinsky,et al.  Analysis of the row grounding technique in a memristor-based crossbar array , 2018, Int. J. Circuit Theory Appl..

[58]  Farnood Merrikh-Bayat,et al.  3-D Memristor Crossbars for Analog and Neuromorphic Computing Applications , 2017, IEEE Transactions on Electron Devices.

[59]  Wei D. Lu,et al.  Data Clustering using Memristor Networks , 2015, Scientific Reports.

[60]  Anand Raghunathan,et al.  Computing in Memory With Spin-Transfer Torque Magnetic RAM , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[61]  Hao Gao,et al.  A 0.20 mm2 3 nW Signal Acquisition IC for Miniature Sensor Nodes in 65 nm CMOS , 2016, IEEE J. Solid State Circuits.

[62]  Zhengya Zhang,et al.  A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations , 2019, Nature Electronics.

[63]  J. Yang,et al.  Memristor crossbar arrays with 6-nm half-pitch and 2-nm critical dimension , 2018, Nature Nanotechnology.

[64]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[65]  L. Chua Memristor-The missing circuit element , 1971 .

[66]  A. Hastings The Art of Analog Layout , 2000 .

[67]  Fabien Alibart,et al.  Pattern classification by memristive crossbar circuits using ex situ and in situ training , 2013, Nature Communications.

[68]  Bin Gao,et al.  Fully hardware-implemented memristor convolutional neural network , 2020, Nature.

[69]  Dmitri B. Strukov,et al.  Hardware-intrinsic security primitives enabled by analogue state and nonlinear conductance variations in integrated memristors , 2018 .

[70]  Yuya Ito,et al.  Corrigendum: Mutations in CDCA7 and HELLS cause immunodeficiency–centromeric instability–facial anomalies syndrome , 2016, Nature Communications.

[71]  Jia Chen,et al.  Strategies to Improve the Accuracy of Memristor-Based Convolutional Neural Networks , 2020, IEEE Transactions on Electron Devices.

[72]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[73]  Dmitri B. Strukov,et al.  Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits , 2017, Nature Communications.

[74]  Ameer Haj-Ali,et al.  IMAGING: In-Memory AlGorithms for Image processiNG , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[75]  Yangyin Chen,et al.  ReRAM: History, Status, and Future , 2020, IEEE Transactions on Electron Devices.

[76]  Rachata Ausavarungnirun,et al.  Processing Data Where It Makes Sense: Enabling In-Memory Computation , 2019, Microprocess. Microsystems.

[77]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[78]  Wei D. Lu,et al.  Experimental Demonstration of Feature Extraction and Dimensionality Reduction Using Memristor Networks. , 2017, Nano letters.