Optimizing locality for ODE solvers

Runge-Kutta methods are popular methods for the solution of systems of ordinary differential equations and are provided by many scientific libraries. The performance of Runge-Kutta methods does not only depend on the specific application problem to be solved but also on the characteristics of the target machine. For processors with memory hierarchy, the locality of data referencing pattern has a large impact on the efficiency of a program. In this paper, we describe program transformations for Runge-Kutta methods resulting in programs with improved locality behavior. The transformations are based on properties of the solution method but are independent from the specific application problem or the specific target machine, so that the resulting implementation is suitable as library function. We show that the locality improvement leads to performance gains on different target machines. We also demonstrate how the locality of memory references can be further increased by exploiting the dependence structure of the right hand side function of specific ordinary differential equations.

[1]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[2]  Mahmut T. Kandemir,et al.  Compiler algorithms for optimizing locality and parallelism on shared and distributed memory machines , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[3]  J. Dormand,et al.  High order embedded Runge-Kutta formulae , 1981 .

[4]  Kang Su Gatlin,et al.  Architecture-Cognizant Divide and Conquer Algorithms , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[5]  Ulrich Rüde,et al.  Memory Characteristics of Iterative Methods , 1999, SC.

[6]  Thomas Rauber,et al.  Diagonal-Implicitly Iterated Runge-Kutta Methods on Distributed Memory Machines , 1999, Int. J. High Speed Comput..

[7]  Jaeyoung Choi,et al.  Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[8]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[9]  Desmond J. Higham,et al.  A Survey of the Explicit Runge-Kutta Method , 1995 .

[10]  Kevin Burrage,et al.  Parallel and sequential methods for ordinary differential equations , 1995, Numerical analysis and scientific computation.

[11]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[12]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[13]  Mahmut T. Kandemir,et al.  Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines , 2000, J. Parallel Distributed Comput..

[14]  C. Weiss,et al.  Memory Characteristics of Iterative Methods , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[15]  E. Fehlberg Classical Fifth-, Sixth-, Seventh-, and Eighth-Order Runge-Kutta Formulas with Stepsize Control , 1968 .

[16]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[17]  E. Hairer,et al.  Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems , 1993 .

[18]  Thomas Rauber,et al.  Parallel execution of embedded and iterated Runge–Kutta methods , 1999 .