A component infrastructure for performance and power modeling of parallel scientific applications

Characterizing the performance of scientific applications is essential for effective code optimization, both by compilers and by high-level adaptive numerical algorithms. While maximizing power efficiency is becoming increasingly important in current high-performance architectures, little or no hardware or software support exists for detailed power measurements. Hardware counter-based power models are a promising method for guiding software-based techniques for reducing power. We present a component-based infrastructure for performance and power modeling of parallel scientific applications. The power model leverages on-chip performance hardware counters and is designed to model power consumption for modern multiprocessor and multicore systems. Our tool infrastructure includes application components as well as performance and power measurement and analysis components. We collect performance data using the TAU performance component and apply the power model in the performance and power analysis of a PETSc-based parallel fluid dynamics application by using the PerfExplorer component.

[1]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[2]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[3]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[4]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[5]  S. Malik,et al.  Instruction level power analysis and optimization of software , 1996, Proceedings of 9th International Conference on VLSI Design.

[6]  Robert Michael Owens,et al.  Analysis of power consumption in memory hierarchies , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[7]  Mary Lou Soffa,et al.  An approach for exploring code improving transformations , 1997, TOPL.

[8]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[9]  Clemens A. Szyperski,et al.  Component software - beyond object-oriented programming , 2002 .

[10]  Mary Jane Irwin,et al.  System level interconnect power modeling , 1998, Proceedings Eleventh Annual IEEE International ASIC Conference (Cat. No.98TH8372).

[11]  Application of Embedded Parallelism to Large Scale Computations of Complex Industrial Flows , 1998, Fluids Engineering.

[12]  W. K. Anderson,et al.  Achieving High Sustained Performance in an Unstructured Mesh CFD Application , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[13]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.

[16]  D. Sciuto,et al.  An instruction-level functionally-based energy estimation model for 32-bits microprocessors , 2000, DAC.

[17]  Mahmut T. Kandemir,et al.  The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.

[18]  Ian Witten,et al.  Data Mining , 2000 .

[19]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[20]  Miodrag Potkonjak,et al.  Function-level power estimation methodology for microprocessors , 2000, DAC.

[21]  M. Potkonjak,et al.  Function-level power estimation methodology for microprocessors , 2000, Proceedings 37th Design Automation Conference.

[22]  Sang Lyul Min,et al.  An Accurate Instruction-Level Energy Consumption Model for Embedded RISC Processors , 2001, OM '01.

[23]  A. Veidenbaum,et al.  Architectural and compiler strategies for dynamic power management in the COPPER project , 2001, 2001 Innovative Architecture for Future Generation High-Performance Processors and Systems.

[24]  Lizy K. John,et al.  Is Compiling for Performance — Compiling for Power? , 2001 .

[25]  Stephen C. Jardin,et al.  Resistive magnetohydrodynamics Simulation of Fusion Plasmas , 2001, PPSC.

[26]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[27]  D. Tafti GenIDLEST: A Scalable Parallel Computational Tool for Simulating Complex Turbulent Flows , 2001, Fluids Engineering.

[28]  C. Kelley,et al.  Pseudo-transient continuation and differential-algebraic equations , 2002 .

[29]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[30]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[31]  Felix Wolf,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications , 2003 .

[32]  Dean M. Tullsen,et al.  The effect of compiler optimizations on Pentium 4 power consumption , 2003, Seventh Workshop on Interaction Between Compilers and Computer Architectures, 2003. INTERACT-7 2003. Proceedings..

[33]  M. Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[34]  Hong Linh Truong,et al.  SCALEA: a performance analysis tool for parallel programs , 2003, Concurr. Comput. Pract. Exp..

[35]  David E. Keyes,et al.  Pseudotransient Continuation and Differential-Algebraic Equations , 2003, SIAM J. Sci. Comput..

[36]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[37]  Lizy Kurian John,et al.  Run-time modeling and estimation of operating system power consumption , 2003, SIGMETRICS '03.

[38]  Bernd Mohr,et al.  Automatic performance analysis of hybrid MPI/OpenMP applications , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[39]  David E. Bernholdt,et al.  Computational Quality of Service for Scientific Components , 2004, CBSE.

[40]  Robert J. Fowler,et al.  HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.

[41]  Toolkit: Intel's Heavy-Duty Dev Tools , 2004, ACM Queue.

[42]  Allen D. Malony,et al.  Performance Evaluation of Adaptive Scientific Applications using TAU , 2005 .

[43]  Toshiaki Yasue,et al.  Design and evaluation of dynamic optimizations for a Java just-in-time compiler , 2005, TOPL.

[44]  Rick Kufrin,et al.  PerfSuite: An Accessible, Open Source Performance Analysis Environment for Linux , 2005 .

[45]  Rong Ge,et al.  High-performance, power-aware distributed computing for scientific applications , 2005, Computer.

[46]  Lizy Kurian John,et al.  Runtime identification of microprocessor energy saving opportunities , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[47]  Allen D. Malony,et al.  Design and implementation of a parallel performance data management framework , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[48]  Allen D. Malony,et al.  Performance technology for parallel and distributed component software: Research Articles , 2005 .

[49]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[50]  Allen D. Malony,et al.  Performance technology for parallel and distributed component software , 2005, Concurr. Pract. Exp..

[51]  Michael Franz,et al.  Power reduction techniques for microprocessor systems , 2005, CSUR.

[52]  Mahmut T. Kandemir,et al.  Compiler-directed high-level energy estimation and optimization , 2005, TECS.

[53]  James Arthur Kohl,et al.  A Component Architecture for High-Performance Scientific Computing , 2006, Int. J. High Perform. Comput. Appl..

[54]  Allen D. Malony,et al.  Computational Quality of Service for Scientific CCA Applications: Composition, Substitution, and Reconfiguration , 2006 .

[55]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[56]  Barbara M. Chapman,et al.  Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications , 2006, IWOMP.

[57]  Li-Shiuan Peh,et al.  High-level power analysis for multi-core chips , 2006, CASES '06.

[58]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[59]  Wenguang Chen,et al.  OpenUH: an optimizing, portable OpenMP compiler , 2007, Concurr. Comput. Pract. Exp..

[60]  Barbara M. Chapman,et al.  Towards an Implementation of the OpenMP Collector API , 2007, PARCO.

[61]  Olof B. Widlund Terascale Optimal PDE Simulations (TOPS) Center , 2007 .

[62]  Lizy Kurian John,et al.  Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[63]  Oscar R. Hernandez,et al.  Capturing performance knowledge for automated analysis , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[64]  David E. Keyes Terascale Optimal PDE Simulations , 2009 .

[65]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .