Cache-Efficient Multigrid Algorithms

Multigrid is widely used as an efficient solver for sparse linear systems arising from the discretization of elliptic boundary value problems. Linear relaxation methods like Gauss-Seidel and Red-Black Gauss-Seidel form the principal computational component of multigrid, and thus affect its efficiency. In the context of multigrid, these iterative solvers are executed for a small number of iterations (2-8). We exploit this property of the algorithm to develop a cache-efficient multigrid, by focusing on improving the memory behavior of the linear relaxation methods. The efficiency in our cache-efficient linear relaxation algorithm comes from two sources: reducing the number of data cache and TLB misses, and reducing the number of memory references by keeping values register-resident. Experiments on five modern computing platforms show a performance improvement of 1.15-2.7 times over a standard implementation of Full Multigrid V-Cycle.

[1]  Alex Povitsky Wavefront cache-friendly algorithm for compact numerical schemes , 2001, Appl. Math. Lett..

[2]  Ulrich Rüde,et al.  Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .

[3]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[4]  Michael A. Frumkin,et al.  Interference Lattice-based Loop Nest Tilings for Stencil Computations , 2001, PPSC.

[5]  Mithuna Thottethodi,et al.  Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.

[6]  Guy L. Steele,et al.  Fortran at ten gigaflops: the connection machine convolution compiler , 1991, PLDI '91.

[7]  L. Greengard The Rapid Evaluation of Potential Fields in Particle Systems , 1988 .

[8]  Sivan Toledo,et al.  Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers , 1997, J. Comput. Syst. Sci..

[9]  William L. Briggs,et al.  A multigrid tutorial , 1987 .

[10]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[11]  Kei Davis,et al.  Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures , 1998, ISCOPE.

[12]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[13]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[14]  Linda Stals,et al.  Techniques For Improving The Data Locality Of Iterative Methods , 1997 .

[15]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[16]  Madhav V. Marathe,et al.  Improving Cache Utilization of Linear Relaxation Methods: Theory and Practice , 1999, ISCOPE.

[17]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[18]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[19]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).