NSF expedition on variability-aware software: Recent results and contributions

Abstract In this paper we summarize recent results and contributions from the NSF Expedition on Variability-Aware Software, a five year, multi-university effort to tackle the problem of hardware variations and its implications and opportunities in software. The Expedition has made contributions in characterization and online monitoring of variations (particularly in microprocessors and flash memories), proposed new coding techniques for variability-tolerant storage, provided tools and platforms for the development of variability-aware software, and created new runtime support systems for variability-aware task-scheduling and execution.

[1]  Mani B. Srivastava,et al.  ViRUS: Virtual Function Replacement Under Stress , 2014, HotPower.

[2]  Puneet Gupta,et al.  VarEMU: An emulation testbed for variability-aware software , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[3]  Luca Benini,et al.  Dynamic variability management in mobile multicore processors under lifetime constraints , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[4]  Luca Benini,et al.  Variation-tolerant OpenMP tasking on tightly-coupled processor clusters , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Luca Benini,et al.  Improving Resilience to Timing Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters , 2014, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[6]  Subhrajit Bhattacharya,et al.  Keeping hot chips cool , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[7]  Puneet Gupta,et al.  SlackProbe: A low overhead in situ on-line timing slack monitoring methodology , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Luca Benini,et al.  Aging-aware compiler-directed VLIW assignment for GPGPU architectures , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Lara Dolecek,et al.  Gallager B LDPC Decoder with Transient and permanent errors , 2013, 2013 IEEE International Symposium on Information Theory.

[10]  Puneet Gupta,et al.  Variability-aware duty cycle scheduling in long running embedded sensing systems , 2011, 2011 Design, Automation & Test in Europe.

[11]  Mark Gottscho,et al.  ViPZonE: Exploiting DRAM Power Variability for Energy Savings in Linux x86-64 1 , 2014 .

[12]  Rajesh K. Gupta,et al.  Accurate Characterization of the Variability in Power Consumption in Modern Mobile Processors , 2012, HotPower.

[13]  Luca Benini,et al.  An On-line Reliability Emulation Framework , 2014, 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing.

[14]  Nikil D. Dutt,et al.  ARGO: Aging-aware GPGPU register file allocation , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[15]  John Sartori,et al.  Low-power, low-storage-overhead chipkill correct via multi-line error correction , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[16]  Nikil D. Dutt,et al.  Multi-layer memory resiliency , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Lara Dolecek,et al.  Gallager B Decoder on Noisy Hardware , 2013, IEEE Transactions on Communications.

[18]  Naresh R. Shanbhag,et al.  Soft digital signal processing , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[19]  HölzleUrs,et al.  The Case for Energy-Proportional Computing , 2007 .

[20]  Marco Platzner,et al.  Design and architectures for dependable embedded systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[21]  Puneet Gupta,et al.  VaMV: Variability-aware Memory Virtualization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[22]  Puneet Gupta,et al.  Power Variability in Contemporary DRAMs , 2012, IEEE Embedded Systems Letters.

[23]  Luca Benini,et al.  A Linux-governor based Dynamic Reliability Manager for android mobile devices , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[24]  Mani B. Srivastava,et al.  Runtime Optimization of System Utility with Variable Hardware , 2015, TECS.

[25]  Frederic Sala,et al.  Coding for Unreliable Flash Memory Cells , 2014, IEEE Communications Letters.

[26]  Bernd Becker,et al.  Early-life-failure detection using SAT-based ATPG , 2013, 2013 IEEE International Test Conference (ITC).

[27]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[28]  David Blaauw,et al.  A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation , 2011, IEEE Journal of Solid-State Circuits.

[29]  Puneet Gupta,et al.  Synthesis and Analysis of Design-Dependent Ring Oscillator (DDRO) Performance Monitors , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[30]  Lara Dolecek,et al.  Belief Propagation Algorithms on Noisy Hardware , 2015, IEEE Transactions on Communications.

[31]  Puneet Gupta,et al.  Variability-aware memory management for nanoscale computing , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[32]  Puneet Gupta,et al.  BTI-Gater: An Aging-Resilient Clock Gating Methodology , 2014, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[33]  Lara Dolecek,et al.  Optimal Design of a Gallager B Noisy Decoder for Irregular LDPC Codes , 2012, IEEE Communications Letters.

[34]  Puneet Gupta,et al.  Power / capacity scaling: Energy savings with simple fault-tolerant caches , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[35]  Sorin Lerner,et al.  Verifying GPU kernels by test amplification , 2012, PLDI.

[36]  Lara Dolecek,et al.  Analysis of finite-alphabet iterative decoders under processing errors , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Puneet Gupta,et al.  RedCooper: Hardware Sensor Enabled Variability Software Testbed for Lifetime Energy Constrained Application , 2014 .

[38]  Puneet Gupta,et al.  ViPZonE: OS-level memory variability-driven physical address zoning for energy savings , 2012, CODES+ISSS '12.

[39]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[40]  Lara Dolecek,et al.  Tackling intracell variability in TLC Flash through tensor product codes , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[41]  Paul H. Siegel,et al.  Characterizing flash memory: Anomalies, observations, and applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[42]  Luca Benini,et al.  Workload and user experience-aware Dynamic Reliability Management in multicore processors , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[43]  Jianbo Gao,et al.  Toward hardware-redundant, fault-tolerant logic for nanoelectronics , 2005, IEEE Design & Test of Computers.

[44]  Subhasish Mitra,et al.  ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[45]  Eric Cheng,et al.  Self-repair of uncore components in robust system-on-chips: An OpenSPARC T2 case study , 2013, 2013 IEEE International Test Conference (ITC).