Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

In distributed memory multicomputers, local memory accesses are much faster than those involving interprocessor communication. For the sake of reducing or even eliminating the interprocessor communication, the array elements in programs must be carefully distributed to local memory of processors for parallel execution. We devote our efforts to the techniques of allocating array elements of nested loops onto multicomputers in a communication-free fashion for parallelizing compilers. We first analyze the pattern of references among all arrays referenced by a nested loop, and then partition the iteration space into blocks without interblock communication. The arrays can be partitioned under the communication-free criteria with nonduplicate or duplicate data. Finally, a heuristic method for mapping the partitioned array elements and iterations onto the fixed-size multicomputers under the consideration of load balancing is proposed. Based on these methods, the nested loops can execute without any communication overhead on the distributed memory multicomputers. Moreover, the performance of the strategies with nonduplicate and duplicate data for matrix multiplication is studied. >

[1]  Jang-Ping Sheu,et al.  Partitioning and Mapping Nested Loops on Multiprocessor Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[2]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[3]  Chung-Ta King,et al.  Pipelined Data Parallel Algorithms-II: Design , 1990, IEEE Trans. Parallel Distributed Syst..

[4]  Mi Lu,et al.  A Solution of the Cache Ping-Pong Problem in Multiprocessor Systems , 1992, J. Parallel Distributed Comput..

[5]  J. Ramanujam,et al.  Compile-Time Techniques for Data Distribution in Distributed Memory Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[6]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[7]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[8]  Dennis Gannon,et al.  On the problem of optimizing data transfers for complex memory systems , 1988, ICS '88.

[9]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[10]  David A. Padua,et al.  High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.

[11]  Stephen H. Friedberg,et al.  Linear Algebra , 2018, Computational Mathematics with SageMath.

[12]  J. Ramanujam,et al.  A methodology for parallelizing programs for multicomputers and complex memory multiprocessors , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[13]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[14]  Jang-Ping Sheu,et al.  On the Parallelism of Nested For-Loops Using Index Shift Method , 1990, ICPP.

[15]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[16]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[17]  Santosh G. Abraham,et al.  Compiler techniques for data partitioning of sequentially iterated parallel loops , 1990, ICS '90.

[18]  Manish Gupta,et al.  Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..

[19]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).