PMaC's green queue: a framework for selecting energy optimal DVFS configurations in large scale MPI applications

This article presents Green Queue, a production quality tracing and analysis framework for implementing application aware dynamic voltage and frequency scaling (DVFS) for message passing interface applications in high performance computing. Green Queue makes use of both intertask and intratask DVFS techniques. The intertask technique targets applications where the workload is imbalanced by reducing CPU clock frequency and therefore power draw for ranks with lighter workloads. The intratask technique targets balanced workloads where all tasks are synchronously running the same code. The strategy identifies program phases and selects the energy‐optimal frequency for each by predicting power and measuring the performance responses of each phase to frequency changes. The success of these techniques is evaluated on 1024 cores on Gordon, a supercomputer at the San Diego Supercomputer Center built using Intel Xeon E5‐2670 (Sandybridge) processors. Green Queue achieves up to 21% and 32% energy savings for the intratask and intertask DVFS strategies, respectively. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[2]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[3]  Prasanna Balaprakash,et al.  SPAPT: Search Problems in Automatic Performance Tuning , 2012, ICCS.

[4]  Rajesh Gupta,et al.  Evaluating the effectiveness of model-based power characterization , 2011 .

[5]  David C. Snowdon,et al.  Accurate on-line prediction of processor and memoryenergy usage under voltage scaling , 2007, EMSOFT '07.

[6]  Ananta Tiwari,et al.  Green Queue: Customized Large-Scale Clock Frequency Scaling , 2012, 2012 Second International Conference on Cloud and Green Computing.

[7]  Ulf Andersson Parallelization of a 3D FD-TD Code for the Maxwell Equations Using MPI , 1998, PARA.

[8]  Tajana Simunic,et al.  Dynamic voltage frequency scaling for multi-tasking systems using online learning , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[9]  Gernot Heiser,et al.  Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[10]  Mitesh R. Meswani,et al.  Reducing Energy Usage with Memory and Computation-Aware Dynamic Frequency Scaling , 2011, Euro-Par.

[11]  P. Sadayappan,et al.  Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[12]  John R. Douceur,et al.  Cycles, cells and platters: an empirical analysisof hardware failures on a million consumer PCs , 2011, EuroSys '11.

[13]  Harvey J. Wasserman,et al.  The National Energy Research Scientific Computing Center: Forty Years of Supercomputing Leadership , 2015, Comput. Sci. Eng..

[14]  Samuel Williams,et al.  Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms , 2009, J. Parallel Distributed Comput..

[15]  C. Svaneborg Large-scale Atomic/Molecular Massively Parallel Simulator , 2011 .

[16]  Peter W. Chung,et al.  Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Simulations of the Molecular Crystal alphaRDX , 2013 .

[17]  Daniel Bedard,et al.  PowerMon: Fine-grained and integrated power monitoring for commodity computer systems , 2010, Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon).

[18]  Mateo Valero,et al.  Understanding the future of energy-performance trade-off via DVFS in HPC environments , 2012, J. Parallel Distributed Comput..

[19]  Massoud Pedram,et al.  Dynamic voltage and frequency scaling based on workload decomposition , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[20]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[21]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[22]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[23]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[24]  Michael Laurenzano,et al.  PSINS: An Open Source Event Tracer and Execution Simulator , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.

[25]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[26]  Chun Chen,et al.  A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[27]  David C. Snowdon,et al.  Koala: a platform for OS-level power management , 2009, EuroSys '09.