Enabling Energy-Efficient and Reliable Neural Network via Neuron-Level Voltage Scaling

With the platforms of running deep neural networks (DNNs) move from large-scale data centers to handheld devices, power emerge as one of the most significant obstacles. Voltage scaling is a promising technique that enables power saving. Nevertheless, it raises reliability and performance concerns that may undesirably deteriorate NNs accuracy and performance. Consequently, an energy-efficient and reliable scheme is required for NNs to balance the above three aspects with satisfied user experience. To this end, we propose a neuron-level voltage scaling framework called NN-APP to model the impact of supply voltages on NNs from output accuracy (A), power (P), and performance (P) perspectives. We analyze the error propagation characteristics in NNs at both inter- and intra-network layers to precisely model the impact of voltage scaling on the final output accuracy at neuron-level. Furthermore, we combine a voltage clustering method and the multi-objective optimization to identify the optimal voltage islands and apply the same voltage to neurons with similar fault tolerance capability. We perform three case studies to demonstrate the efficacy of the proposed techniques.

[1]  Sanghamitra Roy,et al.  GreenTPU: Improving Timing Error Resilience of a Near-Threshold Tensor Processing Unit , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[2]  Kaushik Roy,et al.  AxNN: Energy-efficient neuromorphic systems using approximate computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[3]  Kartheek Rangineni,et al.  ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[4]  Nuno Roma,et al.  GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5]  Mengjia Yan,et al.  UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[6]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[7]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[8]  Xuan Zhang,et al.  Joint Design of Training and Hardware Towards Efficient and Accuracy-Scalable Neural Network Inference , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[9]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[10]  Petru Eles,et al.  Scheduling and mapping of conditional task graph for the synthesis of low power embedded systems , 2003 .

[11]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[14]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[15]  Jose-Maria Arnau,et al.  Computation Reuse in DNNs by Exploiting Input Similarity , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[16]  David Blaauw,et al.  Assessing the performance limits of parallelized near-threshold computing , 2012, DAC Design Automation Conference 2012.

[17]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[18]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[19]  Thierry Moreau,et al.  MATIC: Learning around errors for efficient low-voltage neural network accelerators , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  Karthikeyan Sankaralingam,et al.  Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[21]  Gu-Yeon Wei,et al.  Ares: A framework for quantifying the resilience of deep neural networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[22]  Sudeep Pasricha,et al.  PARM: Power Supply Noise Aware Resource Management for NoC based Multicore Systems in the Dark Silicon Era , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[23]  Yang Hu,et al.  Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[24]  Gu-Yeon Wei,et al.  DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications , 2018, IEEE Journal of Solid-State Circuits.

[25]  Ayse K. Coskun,et al.  Adaptive Power and Resource Management Techniques for Multi-threaded Workloads , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[26]  Qiang Xu,et al.  ApproxANN: An approximate computing framework for artificial neural network , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[27]  JouJing-Yang,et al.  Converter-free multiple-voltage scaling techniques for low-power CMOS digital design , 2006 .

[28]  Huawei Li,et al.  Real-time meets approximate computing: An elastic CNN inference accelerator with adaptive trade-off between QoS and QoR , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[29]  Gianluca Palermo,et al.  Voltage island management in near threshold manycore architectures to mitigate dark silicon , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[30]  Josep Torrellas,et al.  EnergySmart: Toward energy-efficient manycores for Near-Threshold Computing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[31]  David Wentzlaff,et al.  Scaling Datacenter Accelerators with Compute-Reuse Architectures , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[32]  Yong Liu,et al.  A 45nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[33]  Prateek Mittal,et al.  Camouflage: Memory Traffic Shaping to Mitigate Timing Attacks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[34]  Sergei Vassilvitskii,et al.  Local Search Methods for k-Means with Outliers , 2017, Proc. VLDB Endow..

[35]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Li Shang,et al.  Dynamic voltage scaling with links for power optimization of interconnection networks , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[38]  Gerd Ascheid,et al.  Accurate neuron resilience prediction for a flexible reliability management in neural network accelerators , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[39]  Kevin Skadron,et al.  Dark vs. Dim Silicon and Near-Threshold Computing , 2013 .

[40]  James Tschanz,et al.  Postsilicon Voltage Guard-Band Reduction in a 22 nm Graphics Execution Core Using Adaptive Voltage Scaling and Dynamic Power Gating , 2017, IEEE Journal of Solid-State Circuits.

[41]  Rajiv V. Joshi,et al.  Resilient Low Voltage Accelerators for High Energy Efficiency , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[42]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, CVPR.