论文信息 - Large-Memory Nodes for Energy Efficient High-Performance Computing

Large-Memory Nodes for Energy Efficient High-Performance Computing

Energy consumption is by far the most important contributor to HPC cluster operational costs, and it accounts for a significant share of the total cost of ownership. Advanced energy-saving techniques in HPC components have received significant research and development effort, but a simple measure that can dramatically reduce energy consumption is often overlooked. We show that, in capacity computing, where many small to medium-sized jobs have to be solved at the lowest cost, a practical energy-saving approach is to scale-in the application on large-memory nodes. We evaluate scaling-in; i.e. decreasing the number of application processes and compute nodes (servers) to solve a fixed-sized problem, using a set of HPC applications running in a production system. Using standard-memory nodes, we obtain average energy savings of 36%, already a huge figure. We show that the main source of these energy savings is a decrease in the node-hours (node_hours = #nodes x exe_time), which is a consequence of the more efficient use of hardware resources. Scaling-in is limited by the per-node memory capacity. We therefore consider using large-memory nodes to enable a greater degree of scaling-in. We show that the additional energy savings, of up to 52%, mean that in many cases the investment in upgrading the hardware would be recovered in a typical system lifetime of less than five years.

[1] Courtenay T. Vaughan,et al. Energy based performance tuning for large scale high performance computing systems , 2012, HiPC 2012.

[2] D.K. Lowenthal,et al. Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3] Shuaiwen Song,et al. Unified performance and power modeling of scientific workloads , 2013, E2SC '13.

[4] Pedro Reviriego,et al. An Initial Evaluation of Energy Efficient Ethernet , 2011, IEEE Communications Letters.

[5] Paul M. Carpenter,et al. Software-Managed Power Reduction in Infiniband Links , 2014, 2014 43rd International Conference on Parallel Processing.

[6] Antony I. T. Rowstron,et al. Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[7] Kirk W. Cameron,et al. Memory MISER: Improving Main Memory Energy Efficiency in Servers , 2009, IEEE Transactions on Computers.

[8] Ricardo Bianchini,et al. Limiting the power consumption of main memory , 2007, ISCA '07.

[9] S. Dosanjh,et al. Architectures and Technology for Extreme Scale Computing Report from the Workshop Node Architecture and Power Reduction Strategies , 2011 .

[10] Mark Horowitz,et al. Rethinking DRAM Power Modes for Energy Proportionality , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[11] Karthikeyan P. Saravanan,et al. A performance perspective on energy efficient HPC links , 2014, ICS '14.

[12] Feng Pan,et al. Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[13] Christoforos E. Kozyrakis,et al. Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[14] Margaret H. Wright,et al. The opportunities and challenges of exascale computing , 2010 .

[15] Alex Ramírez,et al. Limpio - Lightweight MPI instrumentatiOn , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.