Model-guided performance tuning of parameter values: A case study with molecular dynamics visualization

In this paper, we consider the interaction between application programmers and tools that automatically search a space of application-level parameters that are believed to impact the performance of an application significantly. We study performance tuning of a large scientific application, the visualization component of a molecular dynamics simulation. The key contribution of the approach is the use of high-level programmer-specified models of the expected performance behavior of individual parameters. We use these models to reduce the search space associated with the range of parameter values and achieve results that perform close to that of a more exhaustive search of the parameter space. With this case study, we show the importance of appropriate parameter selection, with the difference between best-case and worst-case performance with a particular input data set and processor configuration of up to a factor of 17. We show that through the use of models, we can drastically reduce search time, examining only 0.3% to 5% of the search space, and usually select an implementation that is close to the best performance, within 0.84% to 15%, even though the models are not completely accurate.

[1]  Paul N. Hilfinger,et al.  Better Tiling and Array Contraction for Compiling Scientific Programs , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[2]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[4]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[5]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[6]  Ken Kennedy,et al.  Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.

[7]  Chun Chen,et al.  A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization , 2005, LCPC.

[8]  Subhash Saini,et al.  Scalable atomistic simulation algorithms for materials research , 2001, SC.

[9]  Matteo Frigo A Fast Fourier Transform Compiler , 1999, PLDI.

[10]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[11]  I-Hsin Chung,et al.  Using Information from Prior Runs to Improve Automated Tuning Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[12]  Gang Ren,et al.  Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.

[13]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[14]  Vahid Tabatabaee,et al.  Parallel Parameter Tuning for Applications with Performance Variability , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[15]  Josep Llosa,et al.  Optimizing program locality through CMEs and GAs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[16]  Michael F. P. O'Boyle,et al.  The effect of cache models on iterative compilation for combined tiling and unrolling , 2004, Concurr. Comput. Pract. Exp..

[17]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[18]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[19]  Michael F. P. O'Boyle,et al.  The effect of cache models on iterative compilation for combined tiling and unrolling: Research Articles , 2004 .

[20]  I-Hsin Chung,et al.  A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.