Analog architectures for neural network acceleration based on non-volatile memory

Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.

[1]  B. Gordon Linear electronic analog/digital conversion architectures, their origins, parameters, limitations, and applications , 1978 .

[2]  F. Merrikh Bayat,et al.  Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[3]  Byung-Gook Park,et al.  3-D Stacked Synapse Array Based on Charge-Trap Flash Memory for Implementation of Deep Neural Networks , 2019, IEEE Transactions on Electron Devices.

[4]  Engin Ipek,et al.  Enabling Scientific Computing on Memristive Accelerators , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[5]  Rajeev Balasubramonian,et al.  Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration , 2018, IEEE Micro.

[6]  I-Ting Wang,et al.  3D Ta/TaOx/TiO2/Ti synaptic array and linearity tuning of weight update for hardware neural network applications , 2016, Nanotechnology.

[7]  H. Hwang,et al.  Improved Synaptic Behavior Under Identical Pulses Using AlOx/HfO2 Bilayer RRAM Array for Neuromorphic Systems , 2016, IEEE Electron Device Letters.

[8]  Majid Ahmadi,et al.  Analog Implementation of a Novel Resistive-Type Sigmoidal Neuron , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Tengyu Ma,et al.  Why is nonvolatile ferroelectric memory field-effect transistor still elusive? , 2002, IEEE Electron Device Letters.

[10]  Pritish Narayanan,et al.  Toward on-chip acceleration of the backpropagation algorithm using nonvolatile memory , 2017, IBM J. Res. Dev..

[11]  Nuno Horta,et al.  A survey on nonlinear analog-to-digital converters , 2014, Integr..

[12]  Wei D. Lu,et al.  Experimental Demonstration of Feature Extraction and Dimensionality Reduction Using Memristor Networks. , 2017, Nano letters.

[13]  Hyunsang Hwang,et al.  TiOx-Based RRAM Synapse With 64-Levels of Conductance and Symmetric Conductance Change by Adopting a Hybrid Pulse Scheme for Neuromorphic Computing , 2016, IEEE Electron Device Letters.

[14]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[15]  An Chen,et al.  A review of emerging non-volatile memory (NVM) technologies and applications , 2016 .

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Lawrence D. Jackel,et al.  An analog neural network processor with programmable topology , 1991 .

[18]  Pritish Narayanan,et al.  Neuromorphic computing using non-volatile memory , 2017 .

[19]  Yuan Xie,et al.  Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator , 2019, ASP-DAC.

[20]  Sapan Agarwal,et al.  Li‐Ion Synaptic Transistor for Low Power Analog Computing , 2017, Advanced materials.

[21]  Pritish Narayanan,et al.  Bidirectional Non-Filamentary RRAM as an Analog Neuromorphic Synapse, Part I: Al/Mo/Pr0.7Ca0.3MnO3 Material Improvements and Device Measurements , 2018, IEEE Journal of the Electron Devices Society.

[22]  Shimeng Yu,et al.  Fully parallel write/read in resistive synaptic array for accelerating on-chip learning , 2015, Nanotechnology.

[23]  Daniel Soudry,et al.  A fully analog memristor-based neural network with online gradient training , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[24]  Ru Huang,et al.  Brain-inspired computing with memristors: Challenges in devices, circuits, and systems , 2020 .

[25]  Kyeong-Sik Min,et al.  New Memristor-Based Crossbar Array Architecture with 50-% Area Reduction and 48-% Power Saving for Matrix-Vector Multiplication of Analog Neuromorphic Computing , 2014 .

[26]  Ojas Parekh,et al.  Energy Scaling Advantages of Resistive Memory Crossbar Based Computation and Its Application to Sparse Coding , 2016, Front. Neurosci..

[27]  Thomas Toifl,et al.  28.5 A 10b 1.5GS/s pipelined-SAR ADC with background second-stage common-mode regulation and offset calibration in 14nm CMOS FinFET , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[28]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[29]  Steven J. Plimpton,et al.  Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator , 2017, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[30]  Qi Liu,et al.  Super non-linear RRAM with ultra-low power for 3D vertical nano-crossbar arrays. , 2016, Nanoscale.

[31]  Jennifer Hasler,et al.  Vector-Matrix Multiply and Winner-Take-All as an Analog Classifier , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[32]  G. W. Burr,et al.  Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element , 2015, 2014 IEEE International Electron Devices Meeting.

[33]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[34]  Farnood Merrikh-Bayat,et al.  High-Performance Mixed-Signal Neurocomputing With Nanoscale Floating-Gate Memory Cell Arrays , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Armantas Melianas,et al.  Redox transistors for neuromorphic computing , 2019, IBM J. Res. Dev..

[36]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[37]  Paul E. Hasler,et al.  A Highly Dense, Low Power, Programmable Analog Vector-Matrix Multiplier: The FPAA Implementation , 2011, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[38]  Steven J. Plimpton,et al.  Resistive memory device requirements for a neural algorithm accelerator , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[39]  Sihwan Kim,et al.  A Programmable and Configurable Mixed-Mode FPAA SoC , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[40]  Yu Wang,et al.  Binary convolutional neural network on RRAM , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[41]  Emmanuelle J. Merced-Grafals,et al.  Repeatable, accurate, and high speed multi-level programming of memristor 1T1R arrays for power efficient analog computing applications , 2016, Nanotechnology.

[42]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[43]  John Paul Strachan,et al.  Chaotic dynamics in nanoscale NbO2 Mott memristors for analogue computing , 2017, Nature.

[44]  Xinjie Guo,et al.  Redesigning commercial floating-gate memory for analog computing applications , 2014, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[45]  Shimeng Yu,et al.  Three-Dimensional nand Flash for Vector–Matrix Multiplication , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[46]  Jennifer Hasler,et al.  Analog Architecture Complexity Theory Empowering Ultra-Low Power Configurable Analog and Mixed Mode SoC Systems , 2019, Journal of Low Power Electronics and Applications.

[47]  Titash Rakshit,et al.  A Multi-Bit Neuromorphic Weight Cell Using Ferroelectric FETs, suitable for SoC Integration , 2018, IEEE Journal of the Electron Devices Society.

[48]  Kailash Gopalakrishnan,et al.  Overview of candidate device technologies for storage-class memory , 2008, IBM J. Res. Dev..

[49]  Qing Wu,et al.  Efficient and self-adaptive in-situ learning in multilayer memristor neural networks , 2018, Nature Communications.

[50]  Yu Wang,et al.  RRAM-Based Analog Approximate Computing , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[51]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[52]  J Joshua Yang,et al.  Memristive devices for computing. , 2013, Nature nanotechnology.

[53]  Ligang Gao,et al.  High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm , 2011, Nanotechnology.

[54]  Jacques-Olivier Klein,et al.  Physical Realization of a Supervised Learning System Built with Organic Memristive Synapses , 2016, Scientific Reports.

[55]  Eby G. Friedman,et al.  Memristor-Based Circuit Design for Multilayer Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[56]  Dejan S. Milojicic,et al.  PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference , 2019, ASPLOS.

[57]  Sparsh Mittal,et al.  A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks , 2018, Mach. Learn. Knowl. Extr..

[58]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[59]  S. Simon Wong,et al.  24.2 A 2.5GHz 7.7TOPS/W switched-capacitor matrix multiplier with co-designed local memory in 40nm , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[60]  Farnood Merrikh-Bayat,et al.  Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.

[61]  Gert Cauwenberghs,et al.  Kerneltron: support vector "machine" in silicon , 2003, IEEE Trans. Neural Networks.

[62]  Yandong Luo,et al.  Monolithically Integrated RRAM- and CMOS-Based In-Memory Computing Optimizations for Efficient Deep Learning , 2019, IEEE Micro.

[63]  Zhengya Zhang,et al.  A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations , 2019, Nature Electronics.

[64]  M. Prezioso,et al.  A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit , 2017, Scientific reports.

[65]  Dmitri B. Strukov,et al.  Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits , 2017, Nature Communications.

[66]  Bin Gao,et al.  Fully hardware-implemented memristor convolutional neural network , 2020, Nature.

[67]  Farnood Merrikh-Bayat,et al.  3-D Memristor Crossbars for Analog and Neuromorphic Computing Applications , 2017, IEEE Transactions on Electron Devices.

[68]  Catherine E. Graves,et al.  Memristor‐Based Analog Computation and Neural Network Classification with a Dot Product Engine , 2018, Advanced materials.

[69]  Avinoam Kolodny,et al.  Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[70]  Xiaochen Peng,et al.  NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[71]  Hoi-Jun Yoo,et al.  UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[72]  Yu Wang,et al.  Computation-oriented fault-tolerance schemes for RRAM computing systems , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[73]  Jason Cong,et al.  Scaling for edge inference of deep neural networks , 2018 .

[74]  P.C.Y. Chen Threshold-alterable Si-gate MOS devices , 1977, IEEE Transactions on Electron Devices.

[75]  Yuan Xie,et al.  DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[76]  F. J. Kub,et al.  Programmable analog vector-matrix multipliers , 1990 .

[77]  Fabien Alibart,et al.  Pattern classification by memristive crossbar circuits using ex situ and in situ training , 2013, Nature Communications.

[78]  Rajiv V. Joshi,et al.  An Energy-Efficient Digital ReRAM-Crossbar-Based CNN With Bitwise Parallelism , 2017, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[79]  Tadahiro Kuroda,et al.  BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W , 2018, IEEE Journal of Solid-State Circuits.

[80]  Haralampos Pozidis,et al.  Recent Progress in Phase-Change Memory Technology , 2016, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[81]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[82]  Carver A. Mead,et al.  A single-transistor silicon synapse , 1996 .

[83]  Tayfun Gokmen,et al.  Algorithm for Training Neural Networks on Resistive Device Arrays , 2019, Frontiers in Neuroscience.

[84]  Matthew J. Marinella,et al.  Using Floating-Gate Memory to Train Ideal Accuracy Neural Networks , 2019, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[85]  Witali L. Dunin-Barkowski,et al.  An approximate backpropagation learning rule for memristor based neural networks using synaptic plasticity , 2015, Neurocomputing.

[86]  Shimeng Yu,et al.  Ferroelectric FET analog synapse for acceleration of deep neural network training , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[87]  Rajiv V. Joshi,et al.  An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[88]  Dmitri B. Strukov,et al.  Energy-Efficient Time-Domain Vector-by-Matrix Multiplier for Neurocomputing and Beyond , 2017, IEEE Transactions on Circuits and Systems II: Express Briefs.

[89]  Marian Verhelst,et al.  An Always-On 3.8 $\mu$ J/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS , 2019, IEEE Journal of Solid-State Circuits.

[90]  H. Wong,et al.  $\hbox{Al}_{2}\hbox{O}_{3}$-Based RRAM Using Atomic Layer Deposition (ALD) With 1-$\mu\hbox{A}$ RESET Current , 2010, IEEE Electron Device Letters.

[91]  Hao Jiang,et al.  Harmonica: A Framework of Heterogeneous Computing Systems With Memristor-Based Neuromorphic Computing Accelerators , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[92]  Meng-Fan Chang,et al.  Emerging NVM Circuit Techniques and Implementations for Energy-Efficient Systems , 2018, Beyond-CMOS Technologies for Next Generation Computer Design.

[93]  Evangelos Eleftheriou,et al.  Computational phase-change memory: beyond von Neumann computing , 2019, Journal of Physics D: Applied Physics.

[94]  Armantas Melianas,et al.  Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing , 2019, Science.

[95]  Wouter A. Serdijn,et al.  Analysis of Power Consumption and Linearity in Capacitive Digital-to-Analog Converters Used in Successive Approximation ADCs , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[96]  Jennifer Hasler,et al.  Finding a roadmap to achieve large neuromorphic hardware systems , 2013, Front. Neurosci..

[97]  Young Sun,et al.  All‐Solid‐State Synaptic Transistor with Ultralow Conductance for Neuromorphic Computing , 2018, Advanced Functional Materials.

[98]  David A. Patterson,et al.  A domain-specific architecture for deep neural networks , 2018, Commun. ACM.

[99]  Yun Long,et al.  A Ferroelectric FET-Based Processing-in-Memory Architecture for DNN Acceleration , 2019, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[100]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[101]  Eriko Nurvitadhi,et al.  Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[102]  Shimeng Yu,et al.  Parallel Architecture With Resistive Crosspoint Array for Dictionary Learning Acceleration , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[103]  D. Querlioz,et al.  Visual Pattern Extraction Using Energy-Efficient “2-PCM Synapse” Neuromorphic Architecture , 2012, IEEE Transactions on Electron Devices.

[104]  Hari Angepat,et al.  Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.

[105]  Tayfun Gokmen,et al.  The Next Generation of Deep Learning Hardware: Analog Computing , 2019, Proceedings of the IEEE.

[106]  Pritish Narayanan,et al.  Analog-to-Digital Conversion With Reconfigurable Function Mapping for Neural Networks Activation Function Acceleration , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[107]  Michael Niemier,et al.  Ferroelectric FETs-Based Nonvolatile Logic-in-Memory Circuits , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[108]  Y. Leblebici,et al.  Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power) , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[109]  Jennifer Hasler,et al.  An Open-Source Tool Set Enabling Analog-Digital-Software Co-Design , 2016 .

[110]  M. Marinella,et al.  A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing. , 2017, Nature materials.

[111]  Advait Madhavan,et al.  Streaming Batch Eigenupdates for Hardware Neural Networks , 2019, Front. Neurosci..

[112]  Shimeng Yu,et al.  Neuro-Inspired Computing With Emerging Nonvolatile Memorys , 2018, Proceedings of the IEEE.

[113]  P. Narayanan,et al.  Access devices for 3D crosspoint memorya) , 2014 .

[114]  Tayfun Gokmen,et al.  Training LSTM Networks With Resistive Cross-Point Devices , 2018, Front. Neurosci..

[115]  Gert Cauwenberghs,et al.  Charge-mode parallel architecture for vector-matrix multiplication , 2001 .

[116]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[117]  H.-S. Philip Wong,et al.  Face classification using electronic synapses , 2017, Nature Communications.

[118]  Mohammed A. Zidan,et al.  Parasitic Effect Analysis in Memristor-Array-Based Neuromorphic Systems , 2018, IEEE Transactions on Nanotechnology.

[119]  Gökmen Tayfun,et al.  Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[120]  Shimeng Yu,et al.  Scaling-up resistive synaptic arrays for neuro-inspired architecture: Challenges and prospect , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[121]  Damien Querlioz,et al.  Contrasting Advantages of Learning With Random Weights and Backpropagation in Non-Volatile Memory Neural Networks , 2019, IEEE Access.

[122]  C. D. James,et al.  Analog high resistance bilayer RRAM device for hardware acceleration of neuromorphic computation , 2018, Journal of Applied Physics.

[123]  Theodore Antonakopoulos,et al.  Mixed-Precision Deep Learning Based on Computational Memory , 2020, Frontiers in Neuroscience.

[124]  Yusuf Leblebici,et al.  Neuromorphic computing with multi-memristive synapses , 2017, Nature Communications.

[125]  H.-S. Philip Wong,et al.  Phase-Change Memory—Towards a Storage-Class Memory , 2017, IEEE Transactions on Electron Devices.

[126]  Shimeng Yu,et al.  Design of Ternary Neural Network With 3-D Vertical RRAM Array , 2017, IEEE Transactions on Electron Devices.

[127]  S. Yuasa,et al.  A magnetic synapse: multilevel spin-torque memristor with perpendicular anisotropy , 2016, Scientific Reports.

[128]  Zhiwei Li,et al.  Binary neural network with 16 Mb RRAM macro chip for classification and online training , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[129]  Kate J. Norris,et al.  Anatomy of Ag/Hafnia‐Based Selectors with 1010 Nonlinearity , 2017, Advanced materials.

[130]  Mohammad Bavandpour,et al.  Efficient Mixed-Signal Neurocomputing Via Successive Integration and Rescaling , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[131]  P. Narayanan,et al.  Recent progress in analog memory-based accelerators for deep learning , 2018, Journal of Physics D: Applied Physics.

[132]  Daniele Ielmini,et al.  Solving matrix equations in one step with cross-point resistive arrays , 2019, Proceedings of the National Academy of Sciences.