Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors

The authors discuss applications of BTDH (bottom-up top-down duplication heuristic) to list scheduling algorithms (LSAs). There are two ways to use BTDH for LSAs. BTDH can be used with an LSA to form a new scheduling algorithm (LSA/BTDH), and it can be used as a pure optimization algorithm for an LSA (LSA-BTDH). BTDH has been applied with two well-known LSAs: the highest level first with estimated time (HLFET) and the earlier task first (ETF) heuristics. Simulation results show that, given a directed acyclic growth (DAG), the graph parallelism of the DAG can accurately predict the number of processors to be used such that a good scheduling length and a good resource utilization (or efficiency) can be achieved simultaneously. In terms of speedups, LSA/BTDH >or= LSA-BTDH >or= ETF >or= HLFET. Experimental results of scheduling FFT programs, which are written in a single program multiple data (SPMD) programming approach, on NCUBE-2 are also presented. The results confirm the simulation results and show that the speedups of LSA/BTDH and LSA-BTDH are better than the speedups of LSAs.<<ETX>>

[1]  T. C. Hu Parallel Sequencing and Assembly Line Problems , 1961 .

[2]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[3]  K. Mani Chandy,et al.  A comparison of list schedules for parallel processing systems , 1974, Commun. ACM.

[4]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[5]  Harold S. Stone,et al.  Multiprocessor Scheduling with the Aid of Network Flow Algorithms , 1977, IEEE Transactions on Software Engineering.

[6]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[7]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[8]  Israel Koren,et al.  The Effect of Operation Scheduling on the Performance of a Data Flow Computer , 1987, IEEE Transactions on Computers.

[9]  Christos H. Papadimitriou,et al.  A Communication-Time Tradeoff , 1987, SIAM J. Comput..

[10]  Donald K. Friesen,et al.  Tighter Bounds for LPT Scheduling on Uniform Processors , 1987, SIAM J. Comput..

[11]  Boontee Kruatrachue,et al.  Static task scheduling and grain packing in parallel processing systems , 1987 .

[12]  Jing-Jang Hwang,et al.  Multiprocessor scheduling with interprocessor communication delays , 1988 .

[13]  David B. Shmoys,et al.  A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach , 1988, SIAM J. Comput..

[14]  J. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[15]  Paul G. Spirakis,et al.  Lower bounds and efficient algorithms for multiprocessor scheduling of dags with communication delays , 1989, SPAA '89.

[16]  Manfred K. Warmuth,et al.  A Fast Algorithm for Multiprocessor Scheduling of Unit-Length Jobs , 1989, SIAM J. Comput..

[17]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[18]  The LAST Algorithm: A Heuristic-Based Static Task Allocation Algorithm , 1989, ICPP.

[19]  Joseph Y.-T. Leung,et al.  Minimizing Schedule Length Subject to Minimum Flow Time , 1989, SIAM J. Comput..

[20]  Jake K. Aggarwal,et al.  Generalized Mapping of Parallel Algorithms Onto Parallel Architectures , 1990, ICPP.

[21]  Mayez A. Al-Mouhamed,et al.  Lower Bound on the Number of Processors and Time for Scheduling Precedence Graphs with Communication Costs , 1990, IEEE Trans. Software Eng..

[22]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[23]  Behrooz Shirazi,et al.  Analysis and Evaluation of Heuristic Methods for Static Task Scheduling , 1990, J. Parallel Distributed Comput..

[24]  Krithi Ramamritham,et al.  Efficient Scheduling Algorithms for Real-Time Multiprocessor Systems , 1989, IEEE Trans. Parallel Distributed Syst..

[25]  Edward A. Lee,et al.  Scheduling to Account for Interprocessor Communication within Interconnection-Constrained Processor Networks , 1990, International Conference on Parallel Processing.

[26]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[27]  Frank D. Anger,et al.  Scheduling with Sufficient Loosely Coupled Processors , 1990, J. Parallel Distributed Comput..

[28]  Hesham El-Rewini,et al.  Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..

[29]  Rajiv Gupta,et al.  Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..

[30]  Jen-Yao Chung,et al.  Scheduling Real-Time Computations on Hypercubes with Load Balancing , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[31]  Tse-Yun Feng,et al.  A Vertically Layered Allocation Scheme for Data Flow Systems , 1991, J. Parallel Distributed Comput..

[32]  Peter Thanisch,et al.  Assigning dependency graphs onto processor networks , 1991, Parallel Comput..

[33]  Henry G. Dietz,et al.  Would You Run it Here or There? AHS: Automatic Heterogeneous Supercomputing , 1993, 1993 International Conference on Parallel Processing - ICPP'93.