A comparison of online and offline strategies for program adaptation

Multithreaded programs executing on modern high-end computing systems have many potential avenues to adapt their execution to improve performance, energy consumption, or both. Program adaptation occurs anytime multiple execution modes are available to the application and one is selected based on information collected during program execution. As a result, some degree of online or offline analysis is required to come to a decision of how best to adapt and there are a variety of tradeoffs to consider when deciding which form of analysis to use, as the overheads they carry with them can vary widely in degree as well as type, as can their effectiveness. In this paper, we attempt to qualitatively and quantitatively analyze the pros and cons of specific types of online and offline forms of information collection and analysis for use in dynamic program adaptation in the context of high performance computing. We focus on providing recommendations of which strategy to employ for users with specific requirements. To justify our recommendations we use data collected from two offline and three online analysis strategies used with a specific power-performance adaptation technique, concurrency throttling. We provide a two-level analysis, comparing online and offline strategies and then comparing strategies within each category. Our results show clear trends in the appropriateness of particular strategies depending on the length of application execution - more specifically the number of iterations in the program - as well as different expected use characteristics ranging from one execution to many, with fixed versus variable program inputs across executions.

[1]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[2]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[3]  Michael Franz,et al.  Continuous Program Optimization: Design and Evaluation , 2001, IEEE Trans. Computers.

[4]  David K. Lowenthal,et al.  Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster , 2006, PPoPP '06.

[5]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[6]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[7]  James Demmel,et al.  Statistical Models for Empirical Search-Based Performance Tuning , 2004, Int. J. High Perform. Comput. Appl..

[8]  Feng Pan,et al.  Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[9]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[10]  Keshav Pingali,et al.  Think globally, search locally , 2005, ICS '05.

[11]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[12]  Gang Ren,et al.  Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.

[13]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[14]  Ken Kennedy,et al.  Profitable loop fusion and tiling using model-driven empirical search , 2006, ICS '06.

[15]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[16]  Xin Yuan,et al.  STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.

[17]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.