Voltage emergency prediction: Using signatures to reduce operating margins

Inductive noise forces microprocessor designers to sacrifice performance in order to ensure correct and reliable operation of their designs. The possibility of wide fluctuations in supply voltage means that timing margins throughout the processor must be set pessimistically to protect against worst-case droops and surges. While sensor-based reactive schemes have been proposed to deal with voltage noise, inherent sensor delays limit their effectiveness. Instead, this paper describes a voltage emergency predictor that learns the signatures of voltage emergencies (the combinations of control flow and microarchitectural events leading up to them) and uses these signatures to prevent recurrence of the corresponding emergencies. In simulations of a representative superscalar microprocessor in which fluctuations beyond 4% of nominal voltage are treated as emergencies (an aggressive configuration), these signatures can pinpoint the likelihood of an emergency some 16 cycles ahead of time with 90% accuracy. This lead time allows machines to operate with much tighter voltage margins (4% instead of 13%) and up to 13.5% higher performance, which closely approaches the 14.2% performance improvement possible with an ideal oracle-based predictor.

[1]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[2]  T. N. Vijaykumar,et al.  Pipeline muffling and a priori current ramping: architectural techniques to reduce high-frequency inductive noise , 2003, ISLPED '03.

[3]  Srikanth Balasubramanian Power delivery for high performance microprocessors , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[4]  Michael C. Huang,et al.  Cherry: checkpointed early resource recycling in out-of-order microprocessors , 2002, MICRO.

[5]  Hiroyuki Sugiyama,et al.  A 1.3 GHz fifth generation SPARC64 microprocessor , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[6]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[7]  Milo M. K. Martin,et al.  Fast Checkpoint/Recovery to Support Kilo-Instruction Speculation and Hardware Fault Tolerance , 2000 .

[8]  Vivek Tiwari,et al.  An architectural solution for the inductive noise problem due to clock-gating , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[9]  Todd M. Austin,et al.  Ultra low-cost defect protection for microprocessor pipelines , 2006, ASPLOS XII.

[10]  Satish Narayanasamy,et al.  Patching Processor Design Errors with Programmable Hardware , 2007, IEEE Micro.

[11]  Yu Cao,et al.  Predictive Technology Model for Nano-CMOS Design Exploration , 2006, 2006 1st International Conference on Nano-Networks and Workshops.

[12]  Margaret Martonosi,et al.  Control techniques to eliminate voltage emergencies in high performance processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[13]  Sanjay J. Patel,et al.  ReStore: symptom based soft error detection in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[14]  T. N. Vijaykumar,et al.  Exploiting resonant behavior to reduce inductive noise , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15]  Nam Sung Kim,et al.  Energy-Efficient and Metastability-Immune Timing-Error Detection and Instruction-Replay-Based Recovery Circuits for Dynamic-Variation Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[16]  Onur Mutlu,et al.  Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[17]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[18]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[19]  Meeta Sharma Gupta,et al.  DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[20]  José F. Martínez,et al.  Checkpointed early load retirement , 2005, 11th International Symposium on High-Performance Computer Architecture.

[21]  Hsien-Hsin S. Lee,et al.  A Floorplan-Aware Dynamic Inductive Noise Controller for Reliable Processor Design , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[22]  Vivek Tiwari,et al.  Microarchitectural simulation and control of di/dt-induced power supply voltage variation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[23]  Meeta Sharma Gupta,et al.  Towards a software approach to mitigate voltage emergencies , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[24]  Meeta Sharma Gupta,et al.  An event-guided approach to reducing voltage noise in processors , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[25]  Satish Narayanasamy,et al.  Patching Processor Design Errors , 2006, 2006 International Conference on Computer Design.

[26]  William V. Huott,et al.  Comparison of Split-Versus Connected-Core Supplies in the POWER6 Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[27]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[28]  Todd M. Austin,et al.  Shielding against design flaws with field repairable control logic , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[29]  Meeta Sharma Gupta,et al.  Understanding Voltage Variations in Chip Multiprocessors using a Distributed Power-Delivery Network , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.