Cache-Aware Multigrid Methods for Solving Poisson's Equation in Two Dimensions

Abstract Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that the need for data locality due to poor main memory latency and limited bandwidth is entirely neglected by many developers designing numerical software. Only when most of the data to be accessed during the computation are found in the system cache (or in one of the caches if the machine architecture comprises a cache hierarchy) fast program execution can be expected. Otherwise, i.e. in case of a significant rate of cache misses, the processor must stay idle until the necessary operands are fetched from main memory, whose cycle time is in general extremely large compared to the time needed to execute a floating point instruction. In this paper, we describe program transformation techniques developed to improve the cache performance of two-dimensional multigrid algorithms. Although we merely consider the solution of Poisson's equation on the unit square using structured grids, our techniques provide valuable hints towards the efficient treatment of more general problems.

[1]  Kei Davis,et al.  Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures , 1998, ISCOPE.

[2]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[3]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[4]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[5]  Hermann Hellwagner,et al.  Data Local Iterative Methods For The Efficient Solution of Partial Differential Equations , 1997 .

[6]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[7]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[8]  David E. Keyes,et al.  Prospects for CFD on Petaflops Systems , 1997 .

[9]  Ulrich Rüde,et al.  Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .

[10]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[11]  Jean-Loup Baer,et al.  A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[12]  Steven G. Johnson,et al.  The Fastest Fourier Transform in the West , 1997 .

[13]  Elana D. Granston,et al.  A Cache Visualization Tool , 1997, Computer.

[14]  Craig C. Douglas,et al.  Caching in with Multigrid Algorithms: Problems in Two Dimensions , 1996, Parallel Algorithms Appl..

[15]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[16]  Wolfgang Hackbusch,et al.  Multi-grid methods and applications , 1985, Springer series in computational mathematics.