Mapping nested loops onto distributed memory multiprocessors

The paper presents Chain grouping; a new low complexity method for the problem of partitioning the index space into groups with little intercommunication requirements, for mapping onto distributed mesh connected architectures. First the loop iterations are scheduled in time, according to the hyperplane method, taking into consideration the minimum time displacement. Then, the index space is divided into discrete groups of related computations, which are assigned to different processors, while preserving the optimal makespan. The Chain grouping method is based on grouping along a uniform chain of computations, formed by a particular dependence vector. This vector will be proved as the best to reduce the total communication requirements. Inside every group, the optimal hyperplane scheduling is preserved, and the references to intragroup computations are considerably increased. The partitioned groups are afterwards assigned to meshes of processors. The resulting space mapping maximises processor utilisation and cuts down overall communication delays while preserving the optimal hyperplane time schedule.

[1]  Dan I. Moldovan,et al.  Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.

[2]  Kleanthis Psarris On exact data dependence analysis , 1992, ICS '92.

[3]  Jang-Ping Sheu,et al.  Partitioning and Mapping Nested Loops on Multiprocessor Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[4]  Nectarios Koziris,et al.  Automatic Hardware Synthesis of Nested Loops Using UET Grids and VHDL , 1997, HPCN Europe.

[5]  Jih-Kwon Peir,et al.  Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.

[6]  Erik H. D'Hollander,et al.  Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..

[7]  Chung-Ta King,et al.  Pipelined Data Parallel Algorithms-II: Design , 1990, IEEE Trans. Parallel Distributed Syst..

[8]  Yves Robert,et al.  Constructive Methods for Scheduling Uniform Loop Nests , 1994, IEEE Trans. Parallel Distributed Syst..

[9]  Nectarios Koziris,et al.  Optimal scheduling for UET-UCT generalized n-dimensional grid task graphs , 1997, Proceedings 11th International Parallel Processing Symposium.

[10]  Yves Robert,et al.  Mapping Uniform Loop Nests Onto Distributed Memory Architectures , 1993, Parallel Comput..

[11]  Nectarios Koziris,et al.  Lower Time and Processor Bounds for Efficient Mapping of Uniform Dependence Algorithms into Systolic Arrays , 1997, Parallel Algorithms Appl..

[12]  Weijia Shang,et al.  Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.

[13]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[14]  Weijia Shang,et al.  Independent Partitioning of Algorithms with Uniform Dependencies , 1992, IEEE Trans. Computers.