Parallel loop generation and scheduling

Loop tiling is an efficient loop transformation, mainly applied to detect coarse-grained parallelism in loops. It is a difficult task to apply n-dimensional non-rectangular tiles to generate parallel loops. This paper offers an efficient scheme to apply non-rectangular n-dimensional tiles in non-rectangular iteration spaces, to generate parallel loops. In order to exploit wavefront parallelism efficiently, all the tiles with equal sum of coordinates are assumed to reside on the same wavefront. Also, in order to assign parallelepiped tiles on each wavefront to different processors, an improved block scheduling strategy is offered in this paper.

[1]  Michael Wolfe,et al.  Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[2]  Saeed Parsa,et al.  A New Genetic Algorithm for Loop Tiling , 2006, The Journal of Supercomputing.

[3]  Christine Eisenbeis,et al.  A general algorithm for data dependence analysis , 1992, ICS '92.

[4]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[5]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[6]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[7]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[8]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[9]  Saeed Parsa,et al.  Loop Parallelization in Multi-dimensional Cartesian Space , 2006, Ershov Memorial Conference.

[10]  Yuan Zhao,et al.  Scalarization Using Loop Alignment and Loop Skewing , 2005, The Journal of Supercomputing.

[11]  Yves Robert,et al.  Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles , 2002, IEEE Trans. Parallel Distributed Syst..

[12]  Tarek S. Abdelrahman,et al.  Scheduling of wavefront parallelism on scalable shared-memory multiprocessors , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[13]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[14]  Nectarios Koziris,et al.  Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs , 2005, The Journal of Supercomputing.

[15]  Mahmut Kandemir,et al.  A Unified Tiling Approach for Out-Of-Core Computations , 1996 .

[16]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[17]  Nectarios Koziris,et al.  Code Generation Methods for Tiling Transformations , 2002, J. Inf. Sci. Eng..

[18]  Nectarios Koziris,et al.  Minimizing completion time for loop tiling with computation and communication overlapping , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.