Forward Communication Only Placements and Their Use for Parallel Program Construction

The context of this paper is automatic parallelization by the space-time mapping method. One key issue in that approach is to adjust the granularity of the derived parallelism. For that purpose, we use tiling in the space and time dimensions. While space tiling is always legal, there are constraints on the possibility of time tiling, unless the placement is such that communications always go in the same direction (forward communications only). We derive an algorithm that automatically constructs an FCO placement – if it exists. We show that the method is applicable to many familiar kernels and that it gives satisfactory speedups.

[1]  Jingling Xue,et al.  On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..

[2]  Daniel A. Reed,et al.  Stencils and Problem Partitionings: Their Influence on the Performance of Multiple Processor Systems , 1987, IEEE Transactions on Computers.

[3]  Jingling Xue Communication-Minimal Tiling of Uniform Dependence Loops , 1997, J. Parallel Distributed Comput..

[4]  Yves Robert,et al.  (Pen)-ultimate tiling? , 1994, Integr..

[5]  Larry Carter,et al.  Selecting tile shape for minimal execution time , 1999, SPAA '99.

[6]  Paul Feautrier Toward Automatic Distribution , 1994, Parallel Process. Lett..

[7]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[8]  Michael Wolfe,et al.  Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[9]  Martin Griebl On tiling space-time mapped loop nests , 2001, SPAA '01.

[10]  Martin Griebl,et al.  Index Set Splitting , 2000, International Journal of Parallel Programming.

[11]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[12]  Jingling Xue,et al.  Reuse-Driven Tiling for Improving Data Locality , 1998, International Journal of Parallel Programming.

[13]  Monica S. Lam,et al.  Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..

[14]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[15]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[16]  Yves Robert,et al.  Mapping affine loop nests: new results , 1995, HPCN Europe.

[17]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[18]  Weijia Shang,et al.  On Time Optimal Supernode Shape , 2002, IEEE Trans. Parallel Distributed Syst..

[19]  Doran Wilde,et al.  A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .

[20]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[21]  Martin Griebl,et al.  A Precise Fixpoint Reaching Definition Analysis for Arrays , 1999, LCPC.