Abstract only: dynamic selection of implementation variants of sequential iterated runge-kutta methods with tile size sampling

This paper describes an efficient self-adaptive procedure for iterated Runge-Kutta (IRK) methods, a class of solution methods for initial value problems (IVPs) of ordinary differential equations (ODEs). IRK methods execute a potentially large number of discrete time steps to compute the solution of the IVP. The performance of an IRK solver may strongly depend on the specific characteristics of the given IVP and the hardware architecture on which the solver is executed. To address this problem, this paper applies dynamic auto-tuning to the sequential execution of IRK methods. Auto-tuning is a promising technique to avoid time consuming and extensive manual tuning. Our self-adaptive IRK solver utilizes the time-stepping nature of the IRK method. It selects the fastest implementation variant for the given IVP on the target architecture from a candidate pool during the first time steps. Then, the fastest implementation variant is used to compute all remaining time steps. The different implementation variants in the candidate pool have been developed by modifications of the loop structure of the basic algorithm. For those implementation variants that use loop tiling, we consider different tile sizes during the auto-tuning phase to further improve the performance of the self-adaptive IRK solver. Runtime experiments demonstrate the efficiency of the self-adaptive IRK solver for different IVPs on different hardware architectures.

[1]  Rüdiger Weiner,et al.  Parameter optimization for explicit parallel peer two-step methods , 2009 .

[2]  Mikel Luján,et al.  Adaptive Loop Tiling for a Multi-cluster CMP , 2008, ICA3PP.

[3]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[4]  Kevin Burrage,et al.  Parallel and sequential methods for ordinary differential equations , 1995, Numerical analysis and scientific computation.

[5]  M. Kiehl,et al.  Optimized extrapolation methods for parallel solution of IVPs on different computer architectures , 1996 .

[6]  E. Hairer,et al.  Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems , 1993 .

[7]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[8]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[9]  J. Ramanujam,et al.  DynTile: Parametric tiled loop generation for parallel execution on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[10]  Gang Ren,et al.  Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.

[11]  Matthias Korch,et al.  Applicability of dynamic selection of implementation variants of sequential iterated Runge-Kutta methods , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[12]  Victor Eijkhout,et al.  Machine Learning for Multi-stage Selection of Numerical Methods , 2010 .

[13]  E. Hairer,et al.  Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems , 2010 .

[14]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[15]  Thomas Rauber,et al.  Locality Optimized Shared-Memory Implementations of Iterated Runge-Kutta Methods , 2007, Euro-Par.

[16]  P. Houwen,et al.  Parallel iteration of high-order Runge-Kutta methods with stepsize control , 1990 .

[17]  Uday Bondhugula,et al.  Combined iterative and model-driven optimization in an automatic parallelization framework , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[19]  Chun Chen,et al.  A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20]  Harald H. Simonsen,et al.  Aspects of parallel Runge-Kutta methods , 1989 .

[21]  Peter Deuflhard,et al.  Massively Parallel Linearly-Implicit Extrapolation Algorithms as a Powerful Tool in Process Simulation , 1997, PARCO.

[22]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[23]  Ernst Hairer,et al.  Solving Ordinary Differential Equations I: Nonstiff Problems , 2009 .

[24]  P. Sadayappan,et al.  Neural Network Assisted Tile Size Selection , 2010 .