On Message Packaging in Task Scheduling for Distributed Memory Parallel Machines

In this paper, we report a performance gap betweeen a schedule with small makespan on the task scheduling model and the corresponding parallel program on distributed memory parallel machines. The main reason of the gap is the software overhead in the interprocessor communication. Therefore, speedup ratios of schedules on the model do not approximate well to those of parallel programs on the machines. The purpose of the paper is to get a task scheduling algorithm that generates a schedule with good approximation to the corresponding parallel program and with small makespan. For this purpose, we propose algorithm BCSH that generates only bulk synchronous schedules. In those schedules, no-communication phases and communication phases appear alternately. All interprocessor communications are done only in the latter phases, and thus the corresponding parallel programs can make better use of the message packaging technique easily. It reduces many software overheads of messages form a source processor to the same destination processor to almost one software overhead, and improves the performance of a parallel program significantly. Finally, we show some experimental results of performance gaps on BCSH, Kruatrachue's algorithm DSH, and Ahmad et al's algorithm ECPFD. The schedules by DSH and ECPFD are famous for their small makespans, but message packaging can not be effectively applied to the corresponding program. The results show that a bulk synchronous schedule with small makespan has advantages that the gap is small and the corresponding program is a high performance parallel one.

[1]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[2]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[3]  Nobuhiko Koike NEC Cenju-3: a microprocessor-based parallel computer , 1994, Proceedings of 8th International Parallel Processing Symposium.

[4]  Satoshi Matsuoka,et al.  OMPI: Optimizing MPI Programs using Partial Evaluation , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[5]  Philippe Chrétienne,et al.  C.P.M. Scheduling with Small Communication Delays and Task Duplication , 1991, Oper. Res..

[6]  Behrooz Shirazi,et al.  Comparative study of task duplication static scheduling versus clustering and non-clustering techniques , 1995, Concurr. Pract. Exp..

[7]  Dharma P. Agrawal,et al.  Optimal Scheduling Algorithm for Distributed-Memory Machines , 1998, IEEE Trans. Parallel Distributed Syst..

[8]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[9]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[10]  Kai Hwang,et al.  Advanced computer architecture - parallelism, scalability, programmability , 1992 .

[11]  S. Ranka,et al.  Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors , 1992, Proceedings Supercomputing '92.

[12]  Yaacov Yesha,et al.  A Scheduling Principle for Precedence Graphs with Communication Delay , 1992, International Conference on Parallel Processing.

[13]  Ishfaq Ahmad,et al.  On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[14]  Hiroaki Ishihata,et al.  An architecture of highly parallel computer AP 1000 , 1991, [1991] IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings.

[15]  David B. Skillicorn,et al.  Questions and Answers about BSP , 1997, Sci. Program..

[16]  P AgrawalDharma,et al.  Optimal Scheduling Algorithm for Distributed-Memory Machines , 1998 .

[17]  Jing-Chiou Liou,et al.  Task Clustering and Scheduling for Distributed Memory Parallel Architectures , 1996, IEEE Trans. Parallel Distributed Syst..