Orchestrator: Guarding Against Voltage Emergencies in Multithreaded Applications

Voltage emergency (VE) has become a critical challenge with decreasing feature size and increasing power capacity. Destructive core interference is one main source of VE in multicore processors. We observed that the applications following single program and multiple data programming model tend to spark domain-wide destructive core interference because multiple threads exhibit similar power activity. We analyze and quantify this effect and propose one low-cost solution, Orchestrator, to avoid voltage droop synergy among cores. Orchestrator leverages the thread diversity to smooth voltage droops in multicore architectures based on thread scheduling. The thread migration impact on performance is also considered. Experimental results show that Orchestrator can significantly reduce VEs, thereby improving performance.

[1]  M.D. Powell,et al.  Pipeline damping: a microarchitectural technique to reduce inductive noise in supply voltage , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[2]  Margaret Martonosi,et al.  Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[3]  Margaret Martonosi,et al.  Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors , 2009, ISCA '09.

[4]  D. Albonesi,et al.  Mitigating inductive noise in SMT processors , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[5]  Meeta Sharma Gupta,et al.  Eliminating voltage emergencies via software-guided code transformations , 2010, TACO.

[6]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[7]  Meeta Sharma Gupta,et al.  Voltage emergency prediction: Using signatures to reduce operating margins , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[8]  Soraya Ghiasi,et al.  A Distributed Critical-Path Timing Monitor for a 65nm High-Performance Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[9]  William V. Huott,et al.  Comparison of Split-Versus Connected-Core Supplies in the POWER6 Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[10]  Meeta Sharma Gupta,et al.  DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[11]  Meeta Sharma Gupta,et al.  Understanding Voltage Variations in Chip Multiprocessors using a Distributed Power-Delivery Network , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[12]  Margaret Martonosi,et al.  Inter-core cooperative TLB for chip multiprocessors , 2010, ASPLOS XV.

[13]  U. Weiser,et al.  Multiple clock and Voltage Domains for chip multi processors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Meeta Sharma Gupta,et al.  Towards a software approach to mitigate voltage emergencies , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[15]  Xiaowei Li,et al.  Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors , 2010, ISCA '10.

[16]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[17]  Yu Hu,et al.  A cost-effective substantial-impact-filter based method to tolerate voltage emergencies , 2011, 2011 Design, Automation & Test in Europe.

[18]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[19]  Yogesh Kumar Agarwal,et al.  k-Partition-based facets of the network design problem , 2006 .

[20]  Bishop Brock,et al.  Active management of timing guardband to save energy in POWER7 , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Michael D. Smith,et al.  Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  David M. Brooks,et al.  Resilient Architectures via Collaborative Design: Maximizing Commodity Processor Performance in the Presence of Variations , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Margaret Martonosi,et al.  Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[24]  Timothy M. Jones,et al.  Beforehand Migration on D-NUCA Caches , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[25]  David Ayers,et al.  Chip-package-board co-design of a 45nm 8-core enterprise Xeon processor , 2010, 2010 Proceedings 60th Electronic Components and Technology Conference (ECTC).

[26]  Peter F. Sweeney,et al.  Understanding the cost of thread migration for multi-threaded Java applications running on a multicore platform , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[27]  Dean M. Tullsen,et al.  Fast thread migration via cache working set prediction , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[28]  Vivek Tiwari,et al.  Microarchitectural simulation and control of di/dt-induced power supply voltage variation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[29]  Srinivas Devadas,et al.  Thread Migration Prediction for Distributed Shared Caches , 2014, IEEE Computer Architecture Letters.

[30]  Yu Chen,et al.  Understanding the Memory Behavior of Emerging Multi-core Workloads , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.

[31]  Katherine A. Yelick,et al.  Hierarchical Computation in the SPMD Programming Model , 2013, LCPC.

[32]  Pierfrancesco Foglia,et al.  An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload , 2009, 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools.

[33]  Nam Sung Kim,et al.  Energy-Efficient and Metastability-Immune Timing-Error Detection and Instruction-Replay-Based Recovery Circuits for Dynamic-Variation Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[34]  Xiang Pan,et al.  VRSync: Characterizing and eliminating synchronization-induced voltage emergencies in many-core processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[35]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[36]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .