Eliminating voltage emergencies via software-guided code transformations

In recent years, circuit reliability in modern high-performance processors has become increasingly important. Shrinking feature sizes and diminishing supply voltages have made circuits more sensitive to microprocessor supply voltage fluctuations. These fluctuations result from the natural variation of processor activity as workloads execute, but when left unattended, these voltage fluctuations can lead to timing violations or even transistor lifetime issues. In this article, we present a hardware--software collaborative approach to mitigate voltage fluctuations. A checkpoint-recovery mechanism rectifies errors when voltage violates maximum tolerance settings, while a runtime software layer reschedules the program's instruction stream to prevent recurring violations at the same program location. The runtime layer, combined with the proposed code-rescheduling algorithm, removes 60% of all violations with minimal overhead, thereby significantly improving overall performance. Our solution is a radical departure from the ongoing industry-standard approach to circumvent the issue altogether by optimizing for the worst-case voltage flux, which compromises power and performance efficiency severely, especially looking ahead to future technology generations. Existing conservative approaches will have severe implications on the ability to deliver efficient microprocessors. The proposed technique reassembles a traditional reliability problem as a runtime performance optimization problem, thus allowing us to design processors for typical case operation by building intelligent algorithms that can prevent recurring violations.

[1]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[2]  Mark C. Toburen Power analysis and instruction scheduling for reduced di/dt in the execution core of high-performanc , 1999 .

[3]  Vivek Tiwari,et al.  An architectural solution for the inductive noise problem due to clock-gating , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[4]  Martin D. Westhead,et al.  A methodology for benchmarking Java Grande applications , 1999, JAVA '99.

[5]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[6]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  E. Duesterwald,et al.  Dynamo: a transparent dynamic optimization system , 2000, PLDI '00.

[8]  Milo M. K. Martin,et al.  Fast Checkpoint/Recovery to Support Kilo-Instruction Speculation and Hardware Fault Tolerance , 2000 .

[9]  Lorna Smith,et al.  Benchmarking Java Grande Applications , 2000 .

[10]  Jihong Kim,et al.  Power-aware modulo scheduling for high-performance VLIW processors , 2001, ISLPED '01.

[11]  J.F. Martinez,et al.  Cherry: Checkpointed early resource recycling in out-of-order microprocessors , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[12]  Vivek Tiwari,et al.  Microarchitectural simulation and control of di/dt-induced power supply voltage variation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[13]  T. N. Vijaykumar,et al.  Pipeline muffling and a priori current ramping: architectural techniques to reduce high-frequency inductive noise , 2003, ISLPED '03.

[14]  Margaret Martonosi,et al.  Control techniques to eliminate voltage emergencies in high performance processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[15]  H. Ando,et al.  A 1.3GHz fifth generation SPARC64 microprocessor , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[16]  Hiroyuki Sugiyama,et al.  A 1.3 GHz fifth generation SPARC64 microprocessor , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[17]  T. N. Vijaykumar,et al.  Exploiting resonant behavior to reduce inductive noise , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[18]  R. Garg,et al.  Adaptive incremental checkpointing for massively parallel systems , 2004, ICS '04.

[19]  David M. Brooks,et al.  Eliminating voltage emergencies via microarchitectural voltage control feedback and dynamic optimization , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[20]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[21]  José F. Martínez,et al.  Checkpointed early load retirement , 2005, 11th International Symposium on High-Performance Computer Architecture.

[22]  Kemal Aygun Power Delivery for High-Performance Microprocessors , 2005 .

[23]  Satish Narayanasamy,et al.  BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging , 2005, ISCA 2005.

[24]  Sanjay J. Patel,et al.  ReStore: Symptom-Based Soft Error Detection in Microprocessors , 2005, IEEE Transactions on Dependable and Secure Computing.

[25]  Brad Calder,et al.  Online performance auditing: using hot optimizations without getting burned , 2006, PLDI '06.

[26]  Todd M. Austin,et al.  Ultra low-cost defect protection for microprocessor pipelines , 2006, ASPLOS XII.

[27]  Sanjay J. Patel,et al.  ReStore: Symptom-Based Soft Error Detection in Microprocessors , 2006, IEEE Trans. Dependable Secur. Comput..

[28]  Thomas R. Gross,et al.  Online optimizations driven by hardware performance monitoring , 2007, PLDI '07.

[29]  Meeta Sharma Gupta,et al.  Towards a software approach to mitigate voltage emergencies , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[30]  William V. Huott,et al.  Comparison of Split-Versus Connected-Core Supplies in the POWER6 Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[31]  Giovanni Agosta,et al.  A parallel dynamic compiler for CIL bytecode , 2008, SIGP.

[32]  Nam Sung Kim,et al.  Energy-Efficient and Metastability-Immune Timing-Error Detection and Instruction-Replay-Based Recovery Circuits for Dynamic-Variation Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[33]  Meeta Sharma Gupta,et al.  DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[34]  Srikanth Balasubramanian Power delivery for high performance microprocessors , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[35]  Meeta Sharma Gupta,et al.  An event-guided approach to reducing voltage noise in processors , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[36]  Jason Mars,et al.  A cross-layer approach to heterogeneity and reliability , 2009, 2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design.

[37]  Vasanth Bala,et al.  Dynamo: a transparent dynamic optimization system , 2000, SIGP.