Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures

In modern computers, non-performance metrics such as energy consumption have become increasingly important, requiring tradeoff with performance. A recent work has proposed performance-guaranteed energy management, but it is designed specifically for sequential applications and cannot be used to a large class of multithreaded applications running on high end computers and data servers. To address the above problem, this paper makes the first attempt to provide performance-guaranteed energy management for multithreaded applications on multiprocessor architectures. We first conduct a comprehensive study on the effects of energy adaptation on thread synchronizations and show that a multithreaded application suffers from not only local slowdowns due to energy adaptation, but also significant slowdowns propagated from other threads because of synchronization. Based on these findings, we design three Synchronization-Aware (SA) algorithms, LWT (Lock Waiting Time-based), CSL (Critical Section Length-based) and ODP (Operation Delay Propagation-based) algorithms, to estimate the energy adaptation-induced slowdowns on each thread. The local slowdowns are then combined across multiple threads via three aggregation methods (MAX, AVG and SUM) to estimate the overall application slowdown. We evaluate our methods using a large multithreaded commercial application, IBM DB2 with industrial-strength online transaction processing (OLTP) workloads, and six SPLASH parallel scientific applications. Our experimental results show that LWT combined with the MAX aggregation method not only controls the performance slow down within the specified limits but also conserves the most energy.

[1]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[3]  Tong Li,et al.  Spin detection hardware for improved management of multithreaded systems , 2006, IEEE Transactions on Parallel and Distributed Systems.

[4]  John Paul Shen,et al.  Scaling and characterizing database workloads: bridging the gap between research and practice , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[5]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[6]  David I. August,et al.  Software-controlled fault tolerance , 2005, TACO.

[7]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[8]  Yuanyuan Zhou,et al.  Performance-directed energy management for storage systems , 2004, IEEE Micro.

[9]  K. Olukotun,et al.  Evaluation of Design Alternatives for a Multiprocessor Microprocessor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[10]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[11]  Mor Harchol-Balter,et al.  Priority mechanisms for OLTP and transactional Web applications , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[13]  Carla Schlatter Ellis,et al.  Power aware page allocation , 2000, SIGP.

[14]  Michael C. Huang,et al.  The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[15]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[16]  David L. Hamilton,et al.  More power needed. , 1974 .