Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores

Consider a multithreaded parallel application running inside a multicore virtual machine context that is itself hosted on a multi-socket multicore physical machine. How should the VMM map virtual cores to physical cores? We compare a local mapping, which compacts virtual cores to processor sockets, and an interleaved mapping, which spreads them over the sockets. Simply choosing between these two mappings exposes clear tradeoffs between performance, energy, and power. We then describe the design, implementation, and evaluation of a system that automatically and dynamically chooses between the two mappings. The system consists of a set of efficient online VMM-based mechanisms and policies that (a) capture the relevant characteristics of memory reference behavior, (b) provide a policy and mechanism for configuring the mapping of virtual machine cores to physical cores that optimizes for power, energy, or performance, and (c) drive dynamic migrations of virtual cores among local physical cores based on the workload and the currently specified objective. Using these techniques we demonstrate that the performance of SPEC and PARSEC benchmarks can be increased by as much as 66%, energy reduced by as much as 31%, and power reduced by as much as 17%, depending on the optimization objective.

[1]  Brice Goglin,et al.  Enabling high-performance memory migration for multithreaded applications on LINUX , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[3]  Khaled Z. Ibrahim,et al.  Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[4]  Lin Zhong,et al.  Self-constructive high-rate system energy modeling for battery-powered mobile systems , 2011, MobiSys '11.

[5]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Sally A. McKee,et al.  Identifying energy-efficient concurrency levels using machine learning , 2007, 2007 IEEE International Conference on Cluster Computing.

[7]  Peter A. Dinda,et al.  An Introduction to the Palacios Virtual Machine Monitor—Release 1.0 , 2008 .

[8]  Rudolf Eigenmann,et al.  SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.

[9]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS 2010.

[10]  Suresh Siddha Process Scheduling Challenges in the Era of Multicore Processors , 2007 .

[11]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[12]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[13]  Dimitrios S. Nikolopoulos,et al.  Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes , 2008, IEEE Transactions on Parallel and Distributed Systems.

[14]  David A. Wood,et al.  IPC Considered Harmful for Multiprocessor Workloads , 2006, IEEE Micro.

[15]  Frank Bellosa,et al.  Resource-conscious scheduling for energy efficiency on multicore processors , 2010, EuroSys '10.

[16]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[17]  Dong Li,et al.  Power-aware MPI task aggregation prediction for high-end computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[18]  Feng Zhao,et al.  Virtual machine power metering and provisioning , 2010, SoCC '10.

[19]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[21]  Christoforos E. Kozyrakis,et al.  A Comparison of High-Level Full-System Power Models , 2008, HotPower.

[22]  Lei Yang,et al.  Accurate online power estimation and automatic battery behavior based power model generation for smartphones , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[23]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[24]  Gu-Yeon Wei,et al.  Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.

[25]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26]  Peter A. Dinda,et al.  Minimal-overhead virtualization of a large scale supercomputer , 2011, VEE '11.

[27]  Tong Li,et al.  Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.

[28]  Peter A. Dinda,et al.  Palacios and Kitten: New high performance operating systems for scalable virtualized and native supercomputing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[29]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[30]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[31]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[32]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[33]  Garima Kochhar,et al.  Optimal BIOS Settings for High Performance Computing with PowerEdge 11 G Servers , 2009 .