Characterizing the Performance and Energy Attributes of Scientific Simulations

We characterize the performance and energy attributes of scientific applications based on nonlinear partial differential equations (PDEs). where the dominant cost is that of sparse linear system solution. We obtain performance and energy metrics using cycle-accurate emulations on a processor and memory system derived from the PowerPC RISC architecture with extensions to resemble the processor in the BlueGene/L. These results indicate that low-power modes of CPUs such as Dynamic Voltage Scaling (DVS) can indeed result in energy savings at the expense of performance degradation. We then consider the impact of certain memory subsystem optimizations to demonstrate that these optimizations in conjunction with DVS can provide faster execution time and lower energy consumption. For example, on the optimized architecture, if DVS is used to scale down the processor to 600MHz, execution times are faster by 45% with energy reductions of 75% compared to the original architecture at 1GHz. The insights gained from this study can help scientific applications better utilize the low-power modes of processors as well as guide the selection of hardware optimizations in future power-aware, high-performance computers.

[1]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[2]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[3]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[4]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[5]  W. K. Anderson,et al.  Achieving High Sustained Performance in an Unstructured Mesh CFD Application , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[6]  Daniel A. Reed,et al.  SvPablo: A multi-language architecture-independent performance analysis system , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[7]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[9]  Rami G. Melhem,et al.  Scheduling with dynamic voltage/speed adjustment using slack reclamation in multi-processor real-time systems , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[10]  Stephen C. Jardin,et al.  Resistive magnetohydrodynamics Simulation of Fusion Plasmas , 2001, PPSC.

[11]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[12]  Michael L. Scott,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, MICRO.

[13]  C. Kelley,et al.  Pseudo-transient continuation and differential-algebraic equations , 2002 .

[14]  Rastislav Bodík,et al.  Slack: maximizing performance under technological constraints , 2002, ISCA.

[15]  Sanjukta Bhowmick,et al.  The Role of Multi-method Linear Solvers in PDE-based Simulations , 2003, ICCSA.

[16]  Klara Nahrstedt,et al.  Energy-efficient soft real-time CPU scheduling for mobile multimedia systems , 2003, SOSP '03.

[17]  P. Raghavan,et al.  Adaptive sparse linear solvers for implicit CFD using Newton-Krylov algorithms , 2003 .

[18]  Rami G. Melhem,et al.  Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[19]  David E. Keyes,et al.  Pseudotransient Continuation and Differential-Algebraic Equations , 2003, SIAM J. Sci. Comput..

[20]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[21]  Gilberto Contreras,et al.  Power prediction for Intel XScale processors using performance monitoring unit events , 2005 .

[22]  Lizy Kurian John,et al.  Runtime identification of microprocessor energy saving opportunities , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[23]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[24]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[25]  Rong Ge,et al.  Power and energy profiling of scientific applications on distributed systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[26]  Margaret Martonosi,et al.  Power prediction for Intel XScale/spl reg/ processors using performance monitoring unit events , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[27]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[28]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[29]  Rogério de Lemos,et al.  Architecting dependable systems , 2003, J. Syst. Softw..

[30]  Mahmut T. Kandemir,et al.  Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling , 2007, The Journal of Supercomputing.

[31]  Mary Jane Irwin,et al.  Memory Optimizations For Fast Power-Aware Sparse Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[32]  A. Bate,et al.  Poster session , 2009, 2009 Device Research Conference.