A Cache-Efficient Parallel Gauss-Seidel Solver with Alternating Tiling

We present a new cache-efficient parallel multilayer Gauss-Seidel algorithm to solve 2D diffusion equations on distributed memory machines, by focusing on improving its cache behaviour and parallelism simultaneously. The novelty of our parallel multi-layer algorithm lies in performing Gauss- Seidel in two alternating sweeping directions (with multiple layers, i.e., iterations per direction) and applying alternating tiling strategies in two opposite sweeping directions to the subdomain allocated to every processor. As a result, its efficiency comes from a significant reduction in two sources of overhead: data cache misses and communication costs. In comparison with two commonly used parallel Gauss-Seidel algorithms, our algorithm has good performance and scalability in a cluster computing environment.

[1]  Parviz Davami,et al.  New stable group explicit finite difference method for solution of diffusion equation , 2006, Appl. Math. Comput..

[2]  J. Ortega,et al.  A multi-color SOR method for parallel computation , 1982, ICPP.

[3]  S. Osher,et al.  The nonconvex multi-dimensional Riemann problem for Hamilton-Jacobi equations , 1991 .

[4]  Siddhartha Chatterjee,et al.  Cache-Efficient Multigrid Algorithms , 2004, Int. J. High Perform. Comput. Appl..

[5]  Larry Carter,et al.  Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..

[6]  W. Hackbusch Iterative Solution of Large Sparse Systems of Equations , 1993 .

[7]  Wilhelm Niethammer,et al.  The SOR method on parallel computers , 1989 .

[8]  Stanley C. Eisenstat,et al.  Comments on scheduling parallel iterative methods on multiprocessor systems II , 1989, Parallel Comput..

[9]  Hongkai Zhao,et al.  A fast sweeping method for Eikonal equations , 2004, Math. Comput..

[10]  Liang Ding,et al.  Notice of Violation of IEEE Publication PrinciplesA New Parallel Gauss-Seidel Method by Iteration Space Alternate Tiling , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[11]  Peiyi Tang,et al.  Generating efficient tiled code for distributed memory machines , 2000, Parallel Comput..

[12]  Jingling Xue,et al.  Code tiling for improving the cache performance of PDE solvers , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[13]  David G. Wonnacott,et al.  Achieving Scalable Locality with Time Skewing , 2002, International Journal of Parallel Programming.

[14]  Nikolaos M. Missirlis,et al.  Scheduling parallel iterative methods on multiprocessor systems , 1987, Parallel Comput..

[15]  Nectarios Koziris,et al.  Coarse-grain Parallel Execution for 2-dimensional PDE Problems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[16]  Jingling Xue,et al.  Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.

[17]  Erik Hagersten,et al.  Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors , 2006, ICS '06.

[18]  Dexuan Xie,et al.  A New Block Parallel SOR Method and Its Analysis , 2005, SIAM J. Sci. Comput..

[19]  Ulrich Rüde,et al.  Memory Characteristics of Iterative Methods , 1999, SC.

[20]  F. Wolf,et al.  Performance Profiling and Analysis of DoD Applications Using PAPI and TAU , 2005, 2005 Users Group Conference (DOD-UGC'05).

[21]  Rami G. Melhem,et al.  Multicolor reordering of sparse matrices resulting from irregular grids , 1988, TOMS.