Scheduling DAG'S for Asynchronous

A new approach is given for scheduling a sequential instruction stream for execution "in parallel" on asynchronous multiprocessors. The key idea in our approach is to exploit the fine grained parallelism present in the instruction stream. In this context, schedules are constructed by a careful balancing of execution and communication costs at the level of individual instructions, and their data dependencies. Three methods are used to evaluate our approach. First, several existing methods are extended to the fine grained situation considered here. Our approach is then compared to these methods using both static schedule length analyses, and simulated executions of the sched- uled code. In each instance, our method is found to provide significantly shorter schedules. Second, by varying parameters such as the speed of the instruction set, and the spedparallelism in the interconnection structure, simulation techniques are used to examine the effects of various architectural considerations on the executions of the schedules. These results show that our approach provides significant speedups in a wide-range of situations. Third, schedules produced by our approach are executed on a two- processor Data General shared memory multiprocessor system. These experiments show that there is a strong correlation between our simulation results (those parameterized to "model" the Data General system), and these actual executions, and thereby serve to validate the simulation studies. Together, our results establish that fine grained parallelism can be exploited in a substantial manner when scheduling a sequential instruction stream for execution "in parallel" on asynchronous multiprocessors. Index rems- Concurrency, parallelism, multiprocessor, he grained parallelism, schedule, asynchronous.

[1]  Jack J. Dongarra,et al.  Unrolling loops in fortran , 1979, Softw. Pract. Exp..

[2]  Vivek Sarkar,et al.  Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[3]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[4]  Brian A. Malloy,et al.  Conversion of simulation processes to pascal constructs , 1990, Softw. Pract. Exp..

[5]  Tomás Lang,et al.  Interconnections Between Processors and Memory Modules Using the Shuffle-Exchange Network , 1976, IEEE Transactions on Computers.

[6]  Andrew Wolfe,et al.  A variable instruction stream extension to the VLIW architecture , 1991, ASPLOS IV.

[7]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[8]  M. Donald MacLaren Inline routines in VAXELN Pascal , 1984, SIGPLAN '84.

[9]  Michael J. Flynn,et al.  Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.

[10]  Zarka Cvetanovic,et al.  The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems , 1987, IEEE Transactions on Computers.

[11]  Christos H. Papadimitriou,et al.  A Communication-Time Tradeoff , 1987, SIAM J. Comput..

[12]  Daniel P. Siewiorek,et al.  Performance Prediction and Calibration for a Class of Multiprocessors , 1988, IEEE Trans. Computers.

[13]  Shlomo Weiss,et al.  A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS 1987.

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Charles N. Fischer,et al.  On the Minimization of Loads/Stores in Local Register Allocation , 1989, IEEE Transactions on Software Engineering.