论文信息 - Green Queue: Customized Large-Scale Clock Frequency Scaling

Green Queue: Customized Large-Scale Clock Frequency Scaling

We examine the scalability of a set of techniques related to Dynamic Voltage-Frequency Scaling (DVFS) on HPC systems to reduce the energy consumption of scientific applications through an application-aware analysis and runtime framework, Green Queue. Green Queue supports making CPU clock frequency changes in response to intra-node and inter-node observations about application behavior. Our intra-node approach reduces CPU clock frequencies and therefore power consumption while CPUs lacks computational work due to inefficient data movement. Our inter-node approach reduces clock frequencies for MPI ranks that lack computational work. We investigate these techniques on a set of large scientific applications on 1024 cores of Gordon, an Intel Sandy bridge-based supercomputer at the San Diego Supercomputer Center. Our optimal intra-node technique showed an average measured energy savings of 10.6% and a maximum of 21.0% over regular application runs. Our optimal inter-node technique showed an average 17.4% and a maximum of 31.7% energy savings.

Ananta Tiwari | Michael Laurenzano | Laura Carrington | Allan Snavely | Joshua Peraza

[1] Kirk W. Cameron,et al. Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems , 2012, Computer Science - Research and Development.

[2] Massoud Pedram,et al. Dynamic voltage and frequency scaling based on workload decomposition , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[3] Bronis R. de Supinski,et al. Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[4] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[5] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6] Michael Laurenzano,et al. PSINS: An Open Source Event Tracer and Execution Simulator , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.

[7] C. Svaneborg. Large-scale Atomic/Molecular Massively Parallel Simulator , 2011 .

[8] S. Huang,et al. Energy-Efficient Cluster Computing via Accurate Workload Characterization , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[9] Mitesh R. Meswani,et al. Reducing Energy Usage with Memory and Computation-Aware Dynamic Frequency Scaling , 2011, Euro-Par.

[10] Rong Ge,et al. Improvement of power-performance efficiency for high-end computing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[11] Michael Laurenzano,et al. PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[12] Wu-chun Feng,et al. A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[13] Richard E. Brown,et al. Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[14] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15] Rong Ge,et al. CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[16] David C. Snowdon,et al. Koala: a platform for OS-level power management , 2009, EuroSys '09.

[17] David K. Lowenthal,et al. Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[18] David K. Lowenthal,et al. Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[19] J. Koomey. Worldwide electricity used in data centers , 2008 .