论文信息 - AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator

AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator

Always-on TinyML perception tasks in IoT applications require very high energy efficiency. Analog computein-memory (CiM) using non-volatile memory (NVM) promises high efficiency and also provides self-contained on-chip model storage. However, analog CiM introduces new practical considerations, including conductance drift, read/write noise, fixed analog-to-digital (ADC) converter gain, etc. These additional constraints must be addressed to achieve models that can be deployed on analog CiM with acceptable accuracy loss. This work describes AnalogNets: TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW). The model architectures are specifically designed for analog CiM, and we detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities, and low-precision data converters at inference time. We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator, with a novel layer-serial approach to remove the cost of complex interconnects associated with a fully-pipelined design. We evaluate the AnalogNets on a calibrated simulator, as well as real hardware, and find that accuracy degradation is limited to 0.8%/1.2% after 24 hours of PCM drift (8-bit) for KWS/VWW. AnalogNets running on the 14nm AON-CiM accelerator demonstrate 8.58/4.37 TOPS/W for KWS/VWW workloads using 8-bit activations, respectively, and increasing to 57.39/25.69 TOPS/W with 4-bit activations.

[1] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.

[2] Pritish Narayanan,et al. Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[3] Luca Benini,et al. Applications of Computation-In-Memory Architectures based on Memristive Devices , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4] Farnood Merrikh-Bayat,et al. High-Performance Mixed-Signal Neurocomputing With Nanoscale Floating-Gate Memory Cell Arrays , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5] Prateek Jain,et al. ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices , 2017, ICML.

[6] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7] Dejan S. Milojicic,et al. PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference , 2019, ASPLOS.

[8] Sujan Kumar Gonugondla,et al. A 0.44-μJ/dec, 39.9-μs/dec, Recurrent Attention In-Memory Processor for Keyword Spotting , 2021, IEEE Journal of Solid-State Circuits.

[9] Hossein Valavi,et al. Fully Row/Column-Parallel In-memory Computing SRAM Macro employing Capacitor-based Mixed-signal Computation with 5-b Inputs , 2021, 2021 Symposium on VLSI Circuits.

[10] Evangelos Eleftheriou,et al. Accurate deep neural network inference using computational phase-change memory , 2019, Nature Communications.

[11] H.-S. Philip Wong,et al. On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[12] P. Narayanan,et al. Fully on-chip MAC at 14nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format , 2021, 2021 Symposium on VLSI Technology.

[13] Shidhartha Das,et al. Training DNN IoT Applications for Deployment On Analog NVM Crossbars , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[14] William J. Dally,et al. Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[15] V JoshiRajiv,et al. Distributed In-Memory Computing on Binary RRAM Crossbar , 2017 .

[16] Matthew Mattina,et al. TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids , 2020, INTERSPEECH.

[17] C. Hagleitner,et al. Device, circuit and system-level analysis of noise in multi-bit phase-change memory , 2010, 2010 International Electron Devices Meeting.

[18] Song Han,et al. MCUNet: Tiny Deep Learning on IoT Devices , 2020, NeurIPS.

[19] Aakanksha Chowdhery,et al. Visual Wake Words Dataset , 2019, ArXiv.

[20] Ryan P. Adams,et al. SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers , 2019, NeurIPS.

[21] Catherine E. Graves,et al. Memristor‐Based Analog Computation and Neural Network Classification with a Dot Product Engine , 2018, Advanced materials.

[22] F. Liu,et al. Whiteout: Gaussian Adaptive Noise Regularization in Deep Neural Networks. , 2016, 1612.01490.

[23] Matthew Mattina,et al. Strong data processing inequality in neural networks with noisy neurons and its implications , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[24] A. Sebastian,et al. 8-bit Precision In-Memory Multiplication with Projected Phase-Change Memory , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[25] Deliang Fan,et al. Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness Against Adversarial Attack , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jiaming Zhang,et al. Analogue signal and image processing with large memristor crossbars , 2017, Nature Electronics.

[27] Engin Ipek,et al. Making Memristive Neural Network Accelerators Reliable , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[28] Bin Gao,et al. Fully hardware-implemented memristor convolutional neural network , 2020, Nature.

[29] Hossein Valavi,et al. A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute , 2019, IEEE Journal of Solid-State Circuits.

[30] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[31] Saurabh Goyal,et al. Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things , 2017, ICML.

[32] Luca Benini,et al. Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification , 2021, IEEE Transactions on Computers.

[33] Daniel Krebs,et al. Collective Structural Relaxation in Phase‐Change Memory Devices , 2018, Advanced Electronic Materials.

[34] Yusuf Leblebici,et al. Neuromorphic computing with multi-memristive synapses , 2017, Nature Communications.

[35] Irem Boybat,et al. Phase-Change Memory Models for Deep Learning Training and Inference , 2019, 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS).

[36] Bohyung Han,et al. Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[37] Sujan Kumar Gonugondla,et al. A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[38] Mahmut E. Sinangil,et al. A 7-nm Compute-in-Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS , 2021, IEEE Journal of Solid-State Circuits.

[39] Mingxing Tan,et al. EfficientNetV2: Smaller Models and Faster Training , 2021, ICML.

[40] Wei Lu,et al. The future of electronics based on memristive systems , 2018, Nature Electronics.

[41] I. Ahsan,et al. HERMES Core – A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing , 2021, 2021 Symposium on VLSI Circuits.

[42] Albert Gural,et al. Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware , 2019, ArXiv.

[43] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[44] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[45] Xi Chen,et al. A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS , 2019, 2019 Symposium on VLSI Circuits.

[46] Gu-Yeon Wei,et al. Applications of Deep Neural Networks for Ultra Low Power IoT , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[47] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] E. Eleftheriou,et al. Memory devices and applications for in-memory computing , 2020, Nature Nanotechnology.

[49] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[50] Dylan R. Muir,et al. Network insensitivity to parameter noise via adversarial regularization , 2021, ArXiv.

[51] Hongyang Jia,et al. A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).