A Parallel Finite Difference Stencil Algorithm Based on Iterative Space Alternate Tiling

Difference stencils are fundamental computations throughout a broad range of scientific and engineering computer programs. In order to optimize data locality and communication overhead, this paper proposes a novel alternate tiling stencil algorithm on distributed memory machines by exploiting the property of the iterative algorithm. The serial execution process of this iterative method is given, which introduces the sequence of iterative space tile as the sequence of execution, and uses time skewing technique to divide iteration space. In this process, nodes of the tile can be traversed many times to improve data locality. The parallel algorithm based on iteration space tile technique is presented, which uses an improved polyhedral model to implement the iteration space tiling algorithm and reorders the tiles of iteration space to reduce cache misses, and the cost of communication and synchronization. The theoretical comparison is given between alternate tiling and other parallelization techniques. Finally numerical results are presented to confirm the effectiveness of serial and parallel execution models of alternate tiling finite difference stencil algorithm, specifically compared with domain-decomposition and red-black iterative methods, and show that the new parallel iterative method has a good data locality, parallel efficiency and scalability.

[1]  F. Wolf,et al.  Performance Profiling and Analysis of DoD Applications Using PAPI and TAU , 2005, 2005 Users Group Conference (DOD-UGC'05).

[2]  Sanjay V. Rajopadhye,et al.  A Geometric Programming Framework for Optimal Multi-Level Tiling , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[3]  Nectarios Koziris,et al.  Exploring the performance limits of simultaneous multithreading for memory intensive applications , 2008, The Journal of Supercomputing.

[4]  Rami G. Melhem,et al.  Multicolor reordering of sparse matrices resulting from irregular grids , 1988, TOMS.

[5]  Larry Carter,et al.  Selecting tile shape for minimal execution time , 1999, SPAA '99.

[6]  Sanjay V. Rajopadhye,et al.  Optimal Semi-Oblique Tiling , 2003, IEEE Trans. Parallel Distributed Syst..

[7]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[8]  Claudia Leopold CACHE MISS ANALYSIS OF 2D STENCIL CODES WITH TILED TIME LOOP , 2003 .

[9]  Nectarios Koziris,et al.  Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[10]  Erik Hagersten,et al.  Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors , 2006, ICS '06.

[11]  Nectarios Koziris,et al.  Coarse-grain Parallel Execution for 2-dimensional PDE Problems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[12]  Parviz Davami,et al.  New stable group explicit finite difference method for solution of diffusion equation , 2006, Appl. Math. Comput..

[13]  Jingling Xue,et al.  Code tiling for improving the cache performance of PDE solvers , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[14]  Martin Griebl,et al.  Automatic code generation for distributed memory architectures in the polytope model , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[15]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[16]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[17]  W. Hackbusch Iterative Solution of Large Sparse Systems of Equations , 1993 .

[18]  Chaoyang Zhang,et al.  Parallel SOR Iterative Algorithms and Performance Evaluation on a Linux Cluster , 2005, PDPTA.

[19]  Larry Carter,et al.  Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..

[20]  Jingling Xue,et al.  Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.

[21]  C. Weiss,et al.  Memory Characteristics of Iterative Methods , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[22]  Ulrich Rüde,et al.  Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .

[23]  Liang Ding,et al.  Notice of Violation of IEEE Publication PrinciplesA New Parallel Gauss-Seidel Method by Iteration Space Alternate Tiling , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[24]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[25]  J. Ortega,et al.  A multi-color SOR method for parallel computation , 1982, ICPP.

[26]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[27]  Dexuan Xie,et al.  A New Block Parallel SOR Method and Its Analysis , 2005, SIAM J. Sci. Comput..

[28]  Albert Cohen,et al.  Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).