Static scheduling for barrier MIMD architectures

In a SIMD or VLIW machine, conceptual synchronizations are accomplished by using a static code schedule that does not require run-time synchronization. The lack of run-time synchronization overhead makes these machines very effective for fine-grain parallelism, but they cannot execute parallel code structures as general as those executed by MIMD architectures, and this limits their utility.In this paper we present a timing analysis that allows a compiler for a MIMD machine to eliminate a large fraction of the run-time synchronization by making efficient use of static code scheduling. Although these techniques can be adapted to be applied to most MIMD machines, this paper centers on the analysis and scheduling for barrier MIMD machines. Barrier MIMDs are asynchronous multiple instruction stream/multiple data stream architectures capable of parallel execution of variable execution-time instructions and arbitrary control flow (e.g., while loops and calls). However, they also incorporate a special hardware barrier synchronization mechanism that facilitates static scheduling by providing a mechanism which the compiler can use to enforce precise timing constraints. In other words, the compiler tracks relative timing between processors and uses static code scheduling until the timing imprecision becomes too large, at which point the compiler simply inserts a barrier to reduce that timing imprecision to zero (or a small constant).This paper describes new scheduling and barrier placement algorithms for barrier MIMDs that are based loosely on the list scheduling approach employed for VLIWs [Ellis 1985]. In addition, the experimental results from scheduling thousands of synthetic benchmark programs for a parameterized barrier MIMD machine are presented.

[1]  T. C. Hu,et al.  Combinatorial algorithms , 1982 .

[2]  Stuart E. Dreyfus,et al.  An Appraisal of Some Shortest-Path Algorithms , 1969, Oper. Res..

[3]  Henry G. Dietz,et al.  Extending Static Synchronization Beyond SIMD and VLIW , 1988 .

[4]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.

[5]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[6]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[7]  Joseph A. Fisher The VLIW Machine: A Multiprocessor for Compiling Scientific Code , 1984, Computer.

[8]  Henry G. Dietz,et al.  Static synchronization beyond VLIW , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[9]  Peter C. Fishburn,et al.  Interval orders and interval graphs : a study of partially ordered sets , 1985 .

[10]  B. Fox Calculating Kth Shortest Paths , 1973 .

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[12]  Richard Pavley,et al.  A Method for the Solution of the Nth Best Path Problem , 1959, JACM.

[13]  Henry G. Dietz,et al.  Hardware Barrier Synchronization: Dynamic Barrier MIMD (DBM) , 1990, ICPP.

[14]  David B. Wortman,et al.  Static and Dynamic Characteristics of XPL Programs , 1975, Computer.

[15]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[16]  Phillip L. Shaffer Minimization of Interprocessor Synchronization In Multiprocessors with Shared and Private Memory , 1989, ICPP.

[17]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[18]  Henry G. Dietz,et al.  Hardware Barrier Synchronization: Static Barrier MIMD (SBM) , 1990, ICPP.

[19]  Thomas L. Casavant,et al.  Experimental Application-Driven Architecture Analysis of an SIMD/MIMD Parallel Processing System , 1990, IEEE Trans. Parallel Distributed Syst..