Evaluating Power-Monitoring Capabilities on IBM Blue Gene/P and Blue Gene/Q

Power consumption is becoming a critical factor as we continue our quest toward exascale computing. Yet, actual power utilization of a complete system is an insufficiently studied research area. Estimating the power consumption of a large scale system is a nontrivial task because a large number of components are involved and because power requirements are affected by the (unpredictable) workloads. Clearly needed is a power-monitoring infrastructure that can provide timely and accurate feedback to system developers and application writers so that they can optimize the use of this precious resource. Many existing large-scale installations do feature power-monitoring sensors, however, those are part of environmental- and health monitoring sub systems and were not designed with application level power consumption measurements in mind. In this paper, we evaluate the existing power monitoring of IBM Blue Gene systems, with the goal of understanding what capabilities are available and how they fare with respect to spatial and temporal resolution, accuracy, latency, and other characteristics. We find that with a careful choice of dedicated micro benchmarks, we can obtain meaningful power consumption data even on Blue Gene/P, where the interval between available data points is measured in minutes. We next evaluate the monitoring subsystem on Blue Gene/Q, and are able to study the power characteristics of FPU and memory subsystems of Blue Gene/Q. We find the monitoring subsystem capable of providing second-scale resolution of power data conveniently separated between node components with seven seconds latency. This represents a significant improvement in power monitoring infrastructure, and hope future systems will enable real-time power measurement in order to better understand application behavior at a finer granularity.

[1]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[2]  Keshav Pingali,et al.  Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming , 1997, PPoPP 1997.

[3]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[4]  Gerard V. Kopcsay,et al.  Packaging the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[5]  Rong Ge,et al.  Power and energy profiling of scientific applications on distributed systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[6]  Ricardo Bianchini,et al.  Energy conservation in heterogeneous server clusters , 2005, PPoPP.

[7]  Wu-chun Feng,et al.  A Feasibility Analysis of Power Awareness in Commodity-Based High-Performance Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.

[8]  Yukikazu Nakamoto,et al.  Power-Aware Resource Allocation with Fair QoS Guarantee , 2006, 12th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'06).

[9]  N. Rasmussen Calculating Total Cooling Requirements for Data Centers , 2007 .

[10]  Wu-chun Feng,et al.  The Green500 List: Encouraging Sustainable Supercomputing , 2007, Computer.

[11]  James H. Rogers,et al.  Early evaluation of IBM BlueGene/P , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Ibm Blue,et al.  Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..

[13]  Rong Ge,et al.  Green Supercomputing Comes of Age , 2008, IT Professional.

[14]  Wu-chun Feng,et al.  The Green500 List: Year one , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Jiuxing Liu,et al.  Evaluating high performance communication: a power perspective , 2009, ICS.

[16]  Massoud Pedram,et al.  Temperature-aware dynamic resource provisioning in a power-optimized datacenter , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[17]  Gregor von Laszewski,et al.  Towards Energy Aware Scheduling for Precedence Constrained Parallel Tasks in a Cluster with DVFS , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[18]  Temperature-aware dynamic resource provisioning in a power-optimized datacenter , 2010, DATE 2010.

[19]  Simon,et al.  Resource allocation to conserve energy in distributed computing , 2011, Int. J. Grid Util. Comput..

[20]  Daniel A. Orozco,et al.  Energy efficient tiling on a Many-Core Architecture , 2011 .

[21]  Wolfgang Frings,et al.  Measuring power consumption on IBM Blue Gene/P , 2011, Computer Science - Research and Development.

[22]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.

[23]  Zhiling Lan,et al.  Measuring Power Consumption on IBM Blue Gene/Q , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.