Locality Optimizations for Jacobi Iteration on Distributed Parallel Systems

In this paper, we propose an inter-nest cache reuse optimization method for Jacobi codes. This method is easy to apply, but effective in that it enhances cache locality of the Jacobi codes while preserving their coarse grain parallelism. We compare our method to two previous locality enhancement techniques that can be used for Jacobi codes: time skewing and new tiling. We quantitatively calculate the main contributing factors to the runtime of different Jacobi codes. We also perform experiments on a PC cluster to verify our analysis. The results show that our method performs poorer than time skewing and new tiling for uniprocessor, but performs better for distributed parallel system.

[1]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[2]  David K. Lowenthal,et al.  Architecture-independent parallelism for both shared- and distributed-memory machines using the Filaments package , 2000, Parallel Comput..

[3]  David G. Wonnacott,et al.  Achieving Scalable Locality with Time Skewing , 2002, International Journal of Parallel Programming.

[4]  Mahmut T. Kandemir,et al.  Optimizing inter-nest data locality , 2002, CASES '02.

[5]  Zhiyuan Li,et al.  IMPACT OF TILE-SIZE SELECTION FOR SKEWED TILING , 2001 .

[6]  Tarek S. Abdelrahman,et al.  Scheduling of wavefront parallelism on scalable shared-memory multiprocessors , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[7]  Michael Wolfe,et al.  Loops skewing: The wavefront method revisited , 1986, International Journal of Parallel Programming.

[8]  David G. Wonnacott,et al.  Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.