AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator

Always-on TinyML perception tasks in IoT applications require very high energy efficiency. Analog computein-memory (CiM) using non-volatile memory (NVM) promises high efficiency and also provides self-contained on-chip model storage. However, analog CiM introduces new practical considerations, including conductance drift, read/write noise, fixed analog-to-digital (ADC) converter gain, etc. These additional constraints must be addressed to achieve models that can be deployed on analog CiM with acceptable accuracy loss. This work describes AnalogNets: TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW). The model architectures are specifically designed for analog CiM, and we detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities, and low-precision data converters at inference time. We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator, with a novel layer-serial approach to remove the cost of complex interconnects associated with a fully-pipelined design. We evaluate the AnalogNets on a calibrated simulator, as well as real hardware, and find that accuracy degradation is limited to 0.8%/1.2% after 24 hours of PCM drift (8-bit) for KWS/VWW. AnalogNets running on the 14nm AON-CiM accelerator demonstrate 8.58/4.37 TOPS/W for KWS/VWW workloads using 8-bit activations, respectively, and increasing to 57.39/25.69 TOPS/W with 4-bit activations.

[1]  Edouard Grave,et al.  Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.

[2]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[3]  Luca Benini,et al.  Applications of Computation-In-Memory Architectures based on Memristive Devices , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Farnood Merrikh-Bayat,et al.  High-Performance Mixed-Signal Neurocomputing With Nanoscale Floating-Gate Memory Cell Arrays , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Prateek Jain,et al.  ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices , 2017, ICML.

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7]  Dejan S. Milojicic,et al.  PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference , 2019, ASPLOS.

[8]  Sujan Kumar Gonugondla,et al.  A 0.44-μJ/dec, 39.9-μs/dec, Recurrent Attention In-Memory Processor for Keyword Spotting , 2021, IEEE Journal of Solid-State Circuits.

[9]  Hossein Valavi,et al.  Fully Row/Column-Parallel In-memory Computing SRAM Macro employing Capacitor-based Mixed-signal Computation with 5-b Inputs , 2021, 2021 Symposium on VLSI Circuits.

[10]  Evangelos Eleftheriou,et al.  Accurate deep neural network inference using computational phase-change memory , 2019, Nature Communications.

[11]  H.-S. Philip Wong,et al.  On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[12]  P. Narayanan,et al.  Fully on-chip MAC at 14nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format , 2021, 2021 Symposium on VLSI Technology.

[13]  Shidhartha Das,et al.  Training DNN IoT Applications for Deployment On Analog NVM Crossbars , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[14]  William J. Dally,et al.  Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[15]  V JoshiRajiv,et al.  Distributed In-Memory Computing on Binary RRAM Crossbar , 2017 .

[16]  Matthew Mattina,et al.  TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids , 2020, INTERSPEECH.

[17]  C. Hagleitner,et al.  Device, circuit and system-level analysis of noise in multi-bit phase-change memory , 2010, 2010 International Electron Devices Meeting.

[18]  Song Han,et al.  MCUNet: Tiny Deep Learning on IoT Devices , 2020, NeurIPS.

[19]  Aakanksha Chowdhery,et al.  Visual Wake Words Dataset , 2019, ArXiv.

[20]  Ryan P. Adams,et al.  SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers , 2019, NeurIPS.

[21]  Catherine E. Graves,et al.  Memristor‐Based Analog Computation and Neural Network Classification with a Dot Product Engine , 2018, Advanced materials.

[22]  F. Liu,et al.  Whiteout: Gaussian Adaptive Noise Regularization in Deep Neural Networks. , 2016, 1612.01490.

[23]  Matthew Mattina,et al.  Strong data processing inequality in neural networks with noisy neurons and its implications , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[24]  A. Sebastian,et al.  8-bit Precision In-Memory Multiplication with Projected Phase-Change Memory , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[25]  Deliang Fan,et al.  Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness Against Adversarial Attack , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jiaming Zhang,et al.  Analogue signal and image processing with large memristor crossbars , 2017, Nature Electronics.

[27]  Engin Ipek,et al.  Making Memristive Neural Network Accelerators Reliable , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[28]  Bin Gao,et al.  Fully hardware-implemented memristor convolutional neural network , 2020, Nature.

[29]  Hossein Valavi,et al.  A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute , 2019, IEEE Journal of Solid-State Circuits.

[30]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[31]  Saurabh Goyal,et al.  Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things , 2017, ICML.

[32]  Luca Benini,et al.  Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification , 2021, IEEE Transactions on Computers.

[33]  Daniel Krebs,et al.  Collective Structural Relaxation in Phase‐Change Memory Devices , 2018, Advanced Electronic Materials.

[34]  Yusuf Leblebici,et al.  Neuromorphic computing with multi-memristive synapses , 2017, Nature Communications.

[35]  Irem Boybat,et al.  Phase-Change Memory Models for Deep Learning Training and Inference , 2019, 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS).

[36]  Bohyung Han,et al.  Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[37]  Sujan Kumar Gonugondla,et al.  A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[38]  Mahmut E. Sinangil,et al.  A 7-nm Compute-in-Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS , 2021, IEEE Journal of Solid-State Circuits.

[39]  Mingxing Tan,et al.  EfficientNetV2: Smaller Models and Faster Training , 2021, ICML.

[40]  Wei Lu,et al.  The future of electronics based on memristive systems , 2018, Nature Electronics.

[41]  I. Ahsan,et al.  HERMES Core – A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing , 2021, 2021 Symposium on VLSI Circuits.

[42]  Albert Gural,et al.  Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware , 2019, ArXiv.

[43]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[44]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[45]  Xi Chen,et al.  A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS , 2019, 2019 Symposium on VLSI Circuits.

[46]  Gu-Yeon Wei,et al.  Applications of Deep Neural Networks for Ultra Low Power IoT , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[47]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  E. Eleftheriou,et al.  Memory devices and applications for in-memory computing , 2020, Nature Nanotechnology.

[49]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[50]  Dylan R. Muir,et al.  Network insensitivity to parameter noise via adversarial regularization , 2021, ArXiv.

[51]  Hongyang Jia,et al.  A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).