Significance-Aware Program Execution on Unreliable Hardware

This article introduces a significance-centric programming model and runtime support that sets the supply voltage in a multicore CPU to sub-nominal values to reduce the energy footprint and provide mechanisms to control output quality. The developers specify the significance of application tasks respecting their contribution to the output quality and provide check and repair functions for handling faults. On a multicore system, we evaluate five benchmarks using an energy model that quantifies the energy reduction. When executing the least-significant tasks unreliably, our approach leads to 20% CPU energy reduction with respect to a reliable execution and has minimal quality degradation.

[1]  Gerhard Wellein,et al.  LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[2]  Uwe Naumann,et al.  Towards automatic significance analysis for approximate computing , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[3]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[4]  Martin C. Rinard,et al.  Approximate computation with outlier detection in Topaz , 2015, OOPSLA.

[5]  Dimitrios S. Nikolopoulos,et al.  A significance-driven programming framework for energy-constrained approximate computing , 2015, Conf. Computing Frontiers.

[6]  Karthikeyan Sankaralingam,et al.  Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.

[7]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[8]  Meeta Sharma Gupta,et al.  Tribeca: Design for PVT variations with local recovery and fine-grained adaptation , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[10]  William Jalby,et al.  Evaluation of CPU frequency transition latency , 2014, Computer Science - Research and Development.

[11]  Michael Engel,et al.  Improving the fault resilience of an H.264 decoder using static analysis methods , 2013, TECS.

[12]  Ravishankar K. Iyer,et al.  Recent advances and new avenues in hardware-level reliability support , 2005, IEEE Micro.

[13]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[14]  Dan Grossman,et al.  Monitoring and Debugging the Quality of Results in Approximate Programs , 2015, ASPLOS.

[15]  Robert Bruce Findler,et al.  Exploring circuit timing-aware language and compilation , 2011, ASPLOS XVI.

[16]  Gerhard Wellein,et al.  LIKWID: Lightweight Performance Tools , 2011, CHPC.

[17]  André DeHon,et al.  Energy Reduction through Differential Reliability and Lightweight Checking , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[18]  Martin C. Rinard Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.

[19]  Foivos S. Zakkak,et al.  Inference and declaration of independence: Impact on deterministic task parallelism , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  André DeHon,et al.  Semantic-Aware Hot Data Selection Policy for Flash File System in Android-Based Smartphones , 2013, FCCM 2013.

[21]  Christos D. Antonopoulos,et al.  GemFI: A Fault Injection Tool for Studying the Behavior of Applications on Unreliable Substrates , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[22]  Muhammad Shafique,et al.  Reliable software for unreliable hardware: Embedded code generation aiming at reliability , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[23]  Scott A. Mahlke,et al.  Input responsiveness: using canary inputs to dynamically steer approximation , 2016, PLDI.

[24]  Luca Benini,et al.  Variation-tolerant OpenMP tasking on tightly-coupled processor clusters , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  William V. Huott,et al.  Comparison of Split-Versus Connected-Core Supplies in the POWER6 Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[26]  Martin C. Rinard,et al.  Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.

[27]  Andreas Peter Burg,et al.  Exploiting dynamic timing margins in microprocessors for frequency-over-scaling with instruction-based clock adjustment , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Alireza Ejlali,et al.  DRVS: Power-efficient reliability management through Dynamic Redundancy and Voltage Scaling under variations , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[29]  Qiang Xu,et al.  ApproxIt: An approximate computing framework for iterative methods , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[30]  Régis Leveugle,et al.  Statistical fault injection: Quantified error and confidence , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[31]  Muhammad Shafique,et al.  Cross-Layer Software Dependability on Unreliable Hardware , 2016, IEEE Transactions on Computers.

[32]  John Sartori,et al.  On software design for stochastic processors , 2012, DAC Design Automation Conference 2012.

[33]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[34]  S. M. Faisal,et al.  b-HiVE: A bit-level history-based error model with value correlation for voltage-scaled integer and floating point units , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[35]  Subhasish Mitra,et al.  ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[36]  Michael D. Smith,et al.  Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[37]  David Blaauw,et al.  Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[38]  Luca Benini,et al.  Analysis of instruction-level vulnerability to dynamic voltage and temperature variations , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[39]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[40]  Sanjay Pant,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[41]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[42]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[43]  Scott A. Mahlke,et al.  Rumba: An online quality management system for approximate computing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).