Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference

Analog/mixed-signal (AMS) computation can be more energy efficient than digital approaches for deep learning inference, but incurs an accuracy penalty from precision loss. Prior AMS approaches focus on small networks/datasets, which can maintain accuracy even with 2b precision. We analyze applicability of AMS approaches to larger networks by proposing a generic AMS error model, implementing it in an existing training framework, and investigating its effect on ImageNet classification with ResNet-50. We demonstrate significant accuracy recovery by exposing the network to AMS error during retraining, and we show that batch normalization layers are responsible for this accuracy recovery. We also introduce an energy model to predict the requirements of high-accuracy AMS hardware running large networks and use it to show that for ADC-dominated designs, there is a direct tradeoff between energy efficiency and network accuracy. Our model predicts that achieving $\lt0.4$% accuracy loss on ResNet-50 with AMS hardware requires a computation energy of at least $\sim 300$ fJ/MAC. Finally, we propose methods for improving the energy-accuracy tradeoff.

[1]  Rahul Sarpeshkar,et al.  Analog Versus Digital: Extrapolating from Electronics to Neurobiology , 1998, Neural Computation.

[2]  Wenguang Chen,et al.  NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[4]  Gu-Yeon Wei,et al.  Ares: A framework for quantifying the resilience of deep neural networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[5]  S. Simon Wong,et al.  Analysis and Design of a Passive Switched-Capacitor Matrix Multiplier for Approximate Computing , 2017, IEEE Journal of Solid-State Circuits.

[6]  Yu Wang,et al.  MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Dimitris Anastassiou,et al.  Switched-capacitor neural networks , 1987 .

[8]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[9]  Kaushik Roy,et al.  Rx-Caffe: Framework for evaluating and training Deep Neural Networks on Resistive Crossbars , 2018, ArXiv.

[10]  Yu Wang,et al.  MErging the Interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Kaushik Roy,et al.  TraNNsformer: Neural network transformation for memristive crossbar based neuromorphic system design , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[12]  Jae-Joon Kim,et al.  Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators , 2018, ArXiv.

[13]  Yiran Chen,et al.  Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[14]  Daisuke Miyashita,et al.  Mixed-signal circuits for embedded machine-learning applications , 2015, 2015 49th Asilomar Conference on Signals, Systems and Computers.

[15]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[16]  Kaushik Roy,et al.  RESPARC: A reconfigurable and energy-efficient architecture with Memristive Crossbars for deep Spiking Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[18]  Bankman Daniel,et al.  An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS , 2016 .

[19]  Jae-Joon Kim,et al.  Input-Splitting of Large Neural Networks for Power-Efficient Accelerator with Resistive Crossbar Memory Array , 2018, ISLPED.

[20]  B. Murmann,et al.  Passive charge redistribution digital-to-analogue multiplier , 2015 .

[21]  Boris Murmann,et al.  An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS , 2016, 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[22]  Farnood Merrikh-Bayat,et al.  Efficient training algorithms for neural networks based on memristive crossbar circuits , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[23]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[24]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[25]  Yu Wang,et al.  Binary convolutional neural network on RRAM , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[26]  Marian Verhelst,et al.  An always-on 3.8μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[27]  Marcel J. M. Pelgrom,et al.  Analog-to-Digital Conversion , 2016 .

[28]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[29]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.