Design Space and Memory Technology Co-exploration for In-Memory Computing Based Machine Learning Accelerators

In-Memory Computing (IMC) has become a promising paradigm for accelerating machine learning (ML) inference. While IMC architectures built on various memory technologies have demonstrated higher throughput and energy efficiency compared to conventional digital architectures, little research has been done from system-level perspective to provide comprehensive and fair comparisons of different memory technologies under the same hardware budget (area). Since large-scale analog IMC hardware relies on the costly analog-digital converters (ADCs) for robust digital communication, optimizing IMC architecture performance requires synergistic co-design of memory arrays and peripheral ADCs, wherein the trade-offs could depend on the underlying memory technologies. To that effect, we co-explore IMC macro design space and memory technology to identify the best design point for each memory type under iso-area budgets, aiming to make fair comparisons among different technologies, including SRAM, phase change memory, resistive RAM, ferroelectrics and spintronics. First, an extended simulation framework employing spatial architecture with off-chip DRAM is developed, capable of integrating both CMOS and nonvolatile memory technologies. Subsequently, we propose different modes of ADC operations with distinctive weight mapping schemes to cope with different on-chip area budgets. Our results show that under an iso-area budget, the various memory technologies being evaluated will need to adopt different IMC macro-level designs to deliver the optimal energy-delay-product (EDP) at system level. We demonstrate that under small area budgets, the choice of best memory technology is determined by its cell area and writing energy. While area budgets are larger, cell area becomes the dominant factor for technology selection.

[1]  Kang L. Wang,et al.  Roadmap of Spin–Orbit Torques , 2021, IEEE Transactions on Magnetics.

[2]  Mingoo Seok,et al.  Modeling and Optimization of SRAM-based In-Memory Computing Hardware Design , 2021, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Charbel Sakr,et al.  Signal Processing Methods to Enhance the Energy Efficiency of In-Memory Computing Architectures , 2021, IEEE Transactions on Signal Processing.

[4]  K. Roy,et al.  A 35.5-127.2 TOPS/W Dynamic Sparsity-Aware Reconfigurable-Precision Compute-in-Memory SRAM Macro for Machine Learning , 2021, IEEE Solid-State Circuits Letters.

[5]  P. Mattavelli,et al.  High Density Embedded PCM Cell in 28nm FDSOI Technology for Automotive Micro-Controller Applications , 2020, 2020 IEEE International Electron Devices Meeting (IEDM).

[6]  Kaushik Roy,et al.  Resistive Crossbars as Approximate Hardware Building Blocks for Machine Learning: Opportunities and Challenges , 2020, Proceedings of the IEEE.

[7]  Ali Keshavarzi,et al.  FerroElectronics for Edge Intelligence , 2020, IEEE Micro.

[8]  Xiaochen Peng,et al.  Benchmark of the Compute-in-Memory-Based DNN Accelerator With Area Constraint , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Vivienne Sze,et al.  Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Wei Tang,et al.  CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm , 2019, MICRO.

[11]  Song Han,et al.  A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[12]  J. Swerts,et al.  Manufacturable 300mm platform solution for Field-Free Switching SOT-MRAM , 2019, 2019 Symposium on VLSI Technology.

[13]  Brucek Khailany,et al.  Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[14]  Meng-Fan Chang,et al.  24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[15]  Dejan S. Milojicic,et al.  PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference , 2019, ASPLOS.

[16]  Kaushik Roy,et al.  8T SRAM Cell as a Multibit Dot-Product Engine for Beyond Von Neumann Computing , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  S. Datta,et al.  In-Memory Computing Primitive for Sensor Data Fusion in 28 nm HKMG FeFET Technology , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[19]  Shimeng Yu,et al.  A Methodology to Improve Linearity of Analog RRAM for Neuromorphic Computing , 2018, 2018 IEEE Symposium on VLSI Technology.

[20]  Shimeng Yu,et al.  Neuro-Inspired Computing With Emerging Nonvolatile Memorys , 2018, Proceedings of the IEEE.

[21]  Shaahin Angizi,et al.  High performance and energy-efficient in-memory computing architecture based on SOT-MRAM , 2017, 2017 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Thomas P. Parnell,et al.  Temporal correlation detection using computational phase-change memory , 2017, Nature Communications.

[24]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[25]  Heiner Giefers,et al.  Mixed-precision in-memory computing , 2017, Nature Electronics.

[26]  V. Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.

[27]  M. Trentzsch,et al.  A 28nm HKMG super low power embedded NVM technology based on ferroelectric FETs , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[28]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[29]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[30]  Vivienne Sze,et al.  Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[31]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Ming-Jinn Tsai,et al.  High-K metal gate contact RRAM (CRRAM) in pure 28nm CMOS logic process , 2012, 2012 International Electron Devices Meeting.

[38]  Nihar R. Mahapatra,et al.  The processor-memory bottleneck: problems and solutions , 1999, CROS.