Scheduling tree-structured programs in the LogP model

The LogP model is a model of parallel computation that characterises a parallel computer architecture by four parameters: the latency L, the overhead o, the gap g and the number of processors P . We study the problem of constructing minimum-length schedules for treestructured programs in the LogP model. This problem is proved to be NP-hard, even for outtrees of height two in LogP models with an unlimited number of processors. For outtrees of height two, a 2-approximation algorithm is presented. For intrees of height two, two approximation algorithms are presented: a 3-approximation algorithm for LogP models with an unrestricted number of processors and a 4 2 P -approximation algorithm for LogP models with a nite number of processors. For the problem of constructing minimum-length schedules for d-ary intrees in a LogP model with a nite number of processors, three approximation algorithms are presented that are applicable in many models of parallel computation. The rst constructs schedules for full d-ary intrees of length at most 2 + 2 d times the length of an optimal schedule plus the time required for (d+ 1)P 1 communication operations. The second constructs schedules on P processors of length at most d+1 d 2 +d d+P times the length of a minimum-length schedule plus the time needed for d(P 1) 1 communication operations. The third constructs schedules of length at most 3 6 P+2 times the length of a minimum-length schedule plus the duration of d(d 1)(P 1) 1 communication operations.

[1]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[2]  Alok Aggarwal,et al.  On communication latency in PRAM computations , 1989, SPAA '89.

[3]  Welf Löwe,et al.  An Approach to Machine-Independent Parallel Programming , 1994, CONPAR.

[4]  Manfred Kunde,et al.  Nonpreemptive LP-Scheduling on Homogeneous Multiprocessor Systems , 1981, SIAM J. Comput..

[5]  Yossi Matias,et al.  The QRQW PRAM: accounting for contention in parallel algorithms , 1994, SODA '94.

[6]  Yossi Matias,et al.  Efficient low-contention parallel algorithms , 1994, SPAA '94.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  Philippe Chrétienne Tree Scheduling with Communication Delays , 1994, Discret. Appl. Math..

[9]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[10]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[11]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[12]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[13]  Charles U. Martel,et al.  Asynchronous PRAMs with Memory Latency , 1994, J. Parallel Distributed Comput..

[14]  Richard P. Martin,et al.  Fast parallel sorting under logp: from theory to practice , 1993 .

[15]  Welf Löwe,et al.  Upper time bounds for executing PRAM-programs on the LogP-machine , 1995, ICS '95.

[16]  Richard Cole,et al.  The APRAM: incorporating asynchrony into the PRAM model , 1989, SPAA '89.

[17]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[18]  Richard M. Karp,et al.  Parallel sorting with limited bandwidth , 1995, SPAA '95.

[19]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[20]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[21]  Tsan-sheng Hsu,et al.  Bounds and Algorithms for a Practical Task Allocation Model (Extended Abstract) , 1996, ISAAC.

[22]  S. Rao Kosaraju Parallel Evaluation of Division-Free Arithmetic Expressions , 1986, STOC 1986.