Optimal and near optimal tree scheduling for parallel systems

The authors present seven algorithms for multiprocessor scheduling of task trees. The objective function of the algorithms is to minimize parallel time (the time between the start of the first processor and the completion of the last processor) in an environment where interprocessor communication costs are significant. Test results are given for implementations of (1) an optimal algorithm that produces a schedule that cannot be improved upon, (2) a greedy algorithm that has minimal overhead, and (3) a 'light load' algorithm which combines the best features of the optimal algorithm and the greedy one. The authors illustrate the trade-off between generating optimal schedules and creating scheduling programs that perform their allocation in a reasonable amount of time. They also give a new NP-complete result.<<ETX>>

[1]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[2]  Jing-Jang Hwang,et al.  Multiprocessor scheduling with interprocessor communication delays , 1988 .

[3]  Wang Ho Yu,et al.  Lu decomposition on a multiprocessing system with communications delay , 1984 .

[4]  Boontee Kruatrachue,et al.  Grain size determination for parallel processing , 1988, IEEE Software.

[5]  Hesham El-Rewini,et al.  Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..

[6]  Frank D. Anger,et al.  Scheduling with Sufficient Loosely Coupled Processors , 1990, J. Parallel Distributed Comput..

[7]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[8]  Zhen Liu,et al.  A Note on Graham's Bound , 1990, Inf. Process. Lett..

[9]  T. C. Hu Parallel Sequencing and Assembly Line Problems , 1961 .

[10]  Christos H. Papadimitriou,et al.  A Communication-Time Tradeoff , 1987, SIAM J. Comput..

[11]  Carolyn McCreary,et al.  Automatic determination of grain size for efficient parallel processing , 1989 .

[12]  M. Cosnard,et al.  Clustering Task Graphs for Message Passing Architectures , 1990 .

[13]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[16]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[17]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[18]  Stanley M. Dunn,et al.  Using an architectural knowledge base to generate code for parallel computers , 1989, CACM.

[19]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[20]  Carolyn McCreary,et al.  Efficient Exploitation of Concurrency Using Graph Decomposition , 1990, ICPP.