Space–time mapping and tiling: a helpful combination

Tiling is a well‐known technique for sequential compiler optimization, as well as for automatic program parallelization. However, in the context of parallelization, tiling should not be considered as a stand‐alone technique, but should be applied after a dedicated parallelization phase, in our case after space–time mapping. We show how tiling can benefit from space–time mapping, and we derive an algorithm for computing tiles which can minimize the number of communication startups, taking the number of physically available processors into account. We also present how the use of a simple cost model reduces real execution time. Copyright © 2004 John Wiley & Sons, Ltd.

[1]  Martin Griebl On the Mechanical Tiling of Space-Time Mapped Loop Nests , 2000 .

[2]  Martin Griebl,et al.  The Loop Parallelizer LooPo-Announcement , 1996, LCPC.

[3]  Jingling Xue,et al.  Reuse-Driven Tiling for Improving Data Locality , 1998, International Journal of Parallel Programming.

[4]  Larry Carter,et al.  Selecting tile shape for minimal execution time , 1999, SPAA '99.

[5]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[6]  Zhiyuan Li,et al.  A Compiler Framework for Tiling Imperfectly-Nested Loops , 1999, LCPC.

[7]  Martin Griebl,et al.  Forward Communication Only Placements and Their Use for Parallel Program Construction , 2002, LCPC.

[8]  John A. Chandy,et al.  Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[9]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[10]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[11]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[12]  Hyuk-Jae Lee,et al.  Communication-Minimal Partitioning and Data Alignment for Affine Nested Loops , 1997, Comput. J..

[13]  Sanjay V. Rajopadhye,et al.  Optimal semi-oblique tiling , 2001, SPAA '01.

[14]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[15]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[16]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[17]  Yves Robert,et al.  Mapping affine loop nests: new results , 1995, HPCN Europe.

[18]  Martin Griebl,et al.  Index Set Splitting , 2000, International Journal of Parallel Programming.

[19]  Daniel A. Reed,et al.  Stencils and Problem Partitionings: Their Influence on the Performance of Multiple Processor Systems , 1987, IEEE Transactions on Computers.

[20]  Jingling Xue Communication-Minimal Tiling of Uniform Dependence Loops , 1997, J. Parallel Distributed Comput..

[21]  Sanjay V. Rajopadhye,et al.  Optimal Orthogonal Tiling , 1998, Euro-Par.

[22]  Larry Carter,et al.  Determining the idle time of a tiling , 1997, POPL '97.

[23]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[24]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[25]  Sanjay V. Rajopadhye,et al.  Optimal Orthogonal Tiling of 2-D Iterations , 1997, J. Parallel Distributed Comput..

[26]  Jingling Xue,et al.  On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..

[27]  Keshav Pingali,et al.  Tiling Imperfectly-nested Loop Nests (REVISED) , 2000 .

[28]  Weijia Shang,et al.  On Time Optimal Supernode Shape , 2002, IEEE Trans. Parallel Distributed Syst..

[29]  Michael Wolfe,et al.  Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[30]  Monica S. Lam,et al.  Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..

[31]  Yves Robert,et al.  Determining the idle time of a tiling: new results , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[32]  Ken Kennedy,et al.  Evaluating Compiler Optimizations for Fortran D , 1994, J. Parallel Distributed Comput..

[33]  Erik H. D'Hollander,et al.  Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..

[34]  Martin Griebl The mechanical parallelization of loop nests containing while loops , 1997 .

[35]  Paul Feautrier Toward Automatic Distribution , 1994, Parallel Process. Lett..

[36]  FeautrierPaul Some efficient solutions to the affine scheduling problem , 1992 .

[37]  Hiroshi Ohta,et al.  Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.