Cache-Efficient Multigrid Algorithms

Multigrid is widely used as an efficient solver for sparse linear systems arising from the discretization of elliptic boundary value problems. Linear relaxation methods such as Gauss–Seidel and Red–Black Gauss–Seidel form the principal computational component of multigrid, and thus affect its efficiency. In the context of multigrid, these iterative solvers are executed for a small number of iterations (2–8). We exploit this property of the algorithm to develop a cache-efficient multigrid method, by focusing on improving the memory behavior of the linear relaxation methods. The efficiency in our cache-efficient linear relaxation algorithm comes from two sources: reducing the number of data cache and TLB misses, and reducing the number of memory references by keeping values registerresident. Our optimizations are applicable to multigrid applied to linear systems arising from constant coefficient elliptic PDEs on structured grids. Experiments on five modern computing platforms show a performance improvement of 1.15–2.7 times over a standard implementation of Full Multigrid V-Cycle.

[1]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[2]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[3]  Sivan Toledo,et al.  Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers , 1997, J. Comput. Syst. Sci..

[4]  William L. Briggs,et al.  A multigrid tutorial , 1987 .

[5]  Kei Davis,et al.  Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures , 1998, ISCOPE.

[6]  Larry Carter,et al.  Quantifying the Multi-Level Nature of Tiling Interactions , 1997, International Journal of Parallel Programming.

[7]  Mithuna Thottethodi,et al.  Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.

[8]  William L. Briggs,et al.  A multigrid tutorial, Second Edition , 2000 .

[9]  Michael A. Frumkin,et al.  Interference Lattice-based Loop Nest Tilings for Stencil Computations , 2001, PPSC.

[10]  Alex Povitsky Wavefront cache-friendly algorithm for compact numerical schemes , 2001, Appl. Math. Lett..

[11]  Ulrich Rüde,et al.  Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .

[12]  Guy L. Steele,et al.  Fortran at ten gigaflops: the connection machine convolution compiler , 1991, PLDI '91.

[13]  Sivan Toledo,et al.  Efficient out-of-core algorithms for linear relaxation using blocking covers , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[14]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[15]  Madhav V. Marathe,et al.  Improving Cache Utilization of Linear Relaxation Methods: Theory and Practice , 1999, ISCOPE.

[16]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[17]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[18]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.