Watts-inside: A hardware-software cooperative approach for Multicore Power Debugging

Multicore computing presents unique challenges for performance and power optimizations due to the multiplicity of cores and the complexity of interactions between the hardware resources. Understanding multicore power and its implications on application behavior is critical to the future of multicore software development. In this paper, we propose Watts-inside, a hardware-software cooperative framework that relies on the efficiency of hardware support to accurately gather application power profiles, and utilizes software support and causation principles for a more comprehensive understanding of application power. We show the design of our framework, along with certain optimizations that increase the ease of implementation. We present a case study using two real applications, Ocean (Splash-2) and Streamcluster (Parsec-1.0) where, with the help of feedback from Watts-inside framework, we made simple code modifications and realized up to 5% power savings on chip power consumption.

[1]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[2]  Nikolas Ioannou,et al.  Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[3]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[5]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[6]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[7]  Paul D. Franzon,et al.  FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[8]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[9]  Lizy Kurian John,et al.  Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[10]  Mahadev Satyanarayanan,et al.  PowerScope: a tool for profiling the energy usage of mobile applications , 1999, Proceedings WMCSA'99. Second IEEE Workshop on Mobile Computing Systems and Applications.

[11]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Shrirang M. Yardi,et al.  CAMP: A technique to estimate per-structure power at run-time using a few simple parameters , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[13]  Bishop Brock,et al.  Introducing the Adaptive Energy Management Features of the Power7 Chip , 2011, IEEE Micro.

[14]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[15]  Parthasarathy Ranganathan,et al.  Energy-Driven Statistical Sampling: Detecting Software Hotspots , 2002, PACS.

[16]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[17]  Chen Ding,et al.  The energy impact of aggressive loop fusion , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[18]  Guru Venkataramani,et al.  The Need for Power Debugging in the Multi-Core Environment , 2012, IEEE Computer Architecture Letters.

[19]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[21]  Efraim Rotem,et al.  Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[22]  Michael C. Huang,et al.  The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[23]  Amin Ansari,et al.  Bundled execution of recurring traces for energy-efficient general purpose processing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Sharad Malik,et al.  Instruction level power analysis and optimization of software , 1996, Proceedings of 9th International Conference on VLSI Design.

[25]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[26]  Shyamkumar Thoziyoor,et al.  CACTI 5 . 1 , 2008 .

[27]  Björn Franke,et al.  Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[28]  John Sartori,et al.  Power balanced pipelines , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[29]  Gu-Yeon Wei,et al.  Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.

[30]  裕幸 飯田,et al.  International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .

[31]  Pradip Bose,et al.  Abstraction and microarchitecture scaling in early-stage power modeling , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.