Large-Memory Nodes for Energy Efficient High-Performance Computing

Energy consumption is by far the most important contributor to HPC cluster operational costs, and it accounts for a significant share of the total cost of ownership. Advanced energy-saving techniques in HPC components have received significant research and development effort, but a simple measure that can dramatically reduce energy consumption is often overlooked. We show that, in capacity computing, where many small to medium-sized jobs have to be solved at the lowest cost, a practical energy-saving approach is to scale-in the application on large-memory nodes. We evaluate scaling-in; i.e. decreasing the number of application processes and compute nodes (servers) to solve a fixed-sized problem, using a set of HPC applications running in a production system. Using standard-memory nodes, we obtain average energy savings of 36%, already a huge figure. We show that the main source of these energy savings is a decrease in the node-hours (node_hours = #nodes x exe_time), which is a consequence of the more efficient use of hardware resources. Scaling-in is limited by the per-node memory capacity. We therefore consider using large-memory nodes to enable a greater degree of scaling-in. We show that the additional energy savings, of up to 52%, mean that in many cases the investment in upgrading the hardware would be recovered in a typical system lifetime of less than five years.

[1]  Courtenay T. Vaughan,et al.  Energy based performance tuning for large scale high performance computing systems , 2012, HiPC 2012.

[2]  D.K. Lowenthal,et al.  Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3]  Shuaiwen Song,et al.  Unified performance and power modeling of scientific workloads , 2013, E2SC '13.

[4]  Pedro Reviriego,et al.  An Initial Evaluation of Energy Efficient Ethernet , 2011, IEEE Communications Letters.

[5]  Paul M. Carpenter,et al.  Software-Managed Power Reduction in Infiniband Links , 2014, 2014 43rd International Conference on Parallel Processing.

[6]  Antony I. T. Rowstron,et al.  Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[7]  Kirk W. Cameron,et al.  Memory MISER: Improving Main Memory Energy Efficiency in Servers , 2009, IEEE Transactions on Computers.

[8]  Ricardo Bianchini,et al.  Limiting the power consumption of main memory , 2007, ISCA '07.

[9]  S. Dosanjh,et al.  Architectures and Technology for Extreme Scale Computing Report from the Workshop Node Architecture and Power Reduction Strategies , 2011 .

[10]  Mark Horowitz,et al.  Rethinking DRAM Power Modes for Energy Proportionality , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[11]  Karthikeyan P. Saravanan,et al.  A performance perspective on energy efficient HPC links , 2014, ICS '14.

[12]  Feng Pan,et al.  Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[13]  Christoforos E. Kozyrakis,et al.  Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[14]  Margaret H. Wright,et al.  The opportunities and challenges of exascale computing , 2010 .

[15]  Alex Ramírez,et al.  Limpio - Lightweight MPI instrumentatiOn , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.