VarEMU: An emulation testbed for variability-aware software

Modern integrated circuits, fabricated in nanometer technologies, suffer from significant power/performance variation across-chip, chip-to-chip and over time due to aging and ambient fluctuations. Furthermore, several existing and emerging reliability loss mechanisms have caused increased transient, intermittent and permanent failure rates. While this variability has been typically addressed by process, device and circuit designers, there has been a recent push towards sensing and adapting to variability in the various layers of software. Current hardware platforms, however, typically lack variability sensing capabilities. Even if sensing capabilities were available, evaluating variability-aware software techniques across a significant number of hardware samples would prove exceedingly costly and time consuming. We introduce VarEMU, an extension to the QEMU virtual machine monitor that serves as a framework for the evaluation of variability-aware software techniques. VarEMU provides users with the means to emulate variations in power consumption and in fault characteristics and to sense and adapt to these variations in software. Through the use (and dynamic change) of parameters in a power model, users can create virtual machines that feature both static and dynamic variations in power consumption. Faults may be injected before or after, or completely replace the execution of any instruction. Power consumption and susceptibility to faults are also subject to dynamic change according to an aging model. A software stack for VarEMU features precise control over faults and provides virtual energy monitors to the operating system and processes. This allows users to precisely quantify and evaluate the effects of variations on individual applications. We show how VarEMU tracks energy consumption according to variation-aware power and aging models and give examples of how it may be used to quantify how faults in instruction execution affect applications.

[1]  Andrew B. Kahng,et al.  Accuracy-configurable adder for approximate arithmetic designs , 2012, DAC Design Automation Conference 2012.

[2]  Yu Wang,et al.  Variation-Aware Supply Voltage Assignment for Simultaneous Power and Aging Optimization , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Thomas A. DeMassa,et al.  Digital Integrated Circuits , 1985, 1985 IEEE GaAs IC Symposium Technical Digest.

[4]  Jan M. Rabaey,et al.  Digital Integrated Circuits , 2003 .

[5]  A. R. Newton,et al.  Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas , 1990 .

[6]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[7]  Puneet Gupta,et al.  ViPZonE: OS-level memory variability-driven physical address zoning for energy savings , 2012, CODES+ISSS '12.

[8]  Lei Chen,et al.  CrashTest'ing SWAT: Accurate, gate-level evaluation of symptom-based resiliency solutions , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[9]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[10]  Luca Benini,et al.  Procedure hopping: a low overhead solution to mitigate variability in shared-L1 processor clusters , 2012, ISLPED '12.

[11]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[13]  Puneet Gupta,et al.  Hardware Variability-Aware Duty Cycling for Embedded Sensors , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Quinn Jacobson,et al.  ERSA: error resilient system architecture for probabilistic applications , 2010, DATE 2010.

[15]  Sujit Dey,et al.  VESPA: Variability emulation for System-on-Chip performance analysis , 2011, 2011 Design, Automation & Test in Europe.

[16]  Lara Dolecek,et al.  Underdesigned and Opportunistic Computing in Presence of Hardware Variability , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Mihaela van der Schaar,et al.  AppAdapt: Opportunistic Application Adaptation in Presence of Hardware Variation , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Valeria Bertacco,et al.  GCS: High-performance gate-level simulation with GPGPUs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[19]  Sarita V. Adve,et al.  Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.

[20]  Hendrikus J. M. Veendrick,et al.  Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits , 1984 .

[21]  Siddharth Garg,et al.  System-level throughput analysis for process variation aware multiple voltage-frequency island designs , 2008, TODE.

[22]  Puneet Gupta,et al.  On the efficacy of NBTI mitigation techniques , 2011, 2011 Design, Automation & Test in Europe.

[23]  Massimo Violante,et al.  FPGA-Based Fault Injection Techniques for Fast Evaluation of Fault Tolerance in VLSI Circuits , 2001, FPL.

[24]  Siddharth Garg,et al.  On the Impact of Manufacturing Process Variations on the Lifetime of Sensor Networks , 2012, ACM Trans. Embed. Comput. Syst..

[25]  Yu Cao,et al.  The Impact of NBTI on the Performance of Combinational and Sequential Circuits , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[26]  Kevin T. Pedretti,et al.  SST + gem5 = a scalable simulation infrastructure for high performance computing , 2012, SimuTools.

[27]  Yu Cao,et al.  Predictive Modeling of the NBTI Effect for Reliable Design , 2006, IEEE Custom Integrated Circuits Conference 2006.

[28]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[29]  A.B. Kahng,et al.  Impact of Guardband Reduction On Design Outcomes: A Quantitative Approach , 2009, IEEE Transactions on Semiconductor Manufacturing.