An efficient resource-constrained global scheduling technique for superscalar and VLIW processors

We describe a new algorithm for parallelization of sequential code that eliminates anti and output dependences by renaming registers on an as-needed basis during scheduling. A dataflow attribute at the beginning of each basic block indicates what operations are available for moving up through this basic block. Scheduling consists of choosing the best operation from the set of opemtions that can move to a point, moving the instances of the opemtion to the point, making bookkeeping copies for edges that join the moving path but are not on it, and updating the dataflow attributes of basic blocks only on the paths that were tmversed by the instances of the moved opemtions. The code motions are done globally without going through atomic transformations of percolation scheduling, for better eficiency. For performing the code motions, we use an intermediate representation that is directly executable as sequential RISC code, rather than VLIW de. As a result, the new algorithm can be used to generate parallelized code for multiple ALU superscalar processors as well. The enhanced pipeline scheduling algorithm for software pipelining of arbitrary code is reformulated within the framework of the new sequential RISC representation. The new algorithm has been implemented, and preliminary results on AIX utilities indicate that it requires significantly less compilation time than the percolation scheduling approach.

[1]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[2]  Michael Rodeh,et al.  Global instruction scheduling for superscalar machines , 1991, PLDI '91.

[3]  Vicki H. Allan,et al.  Software pipelining: a comparison and improvement , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[4]  Gregory F. Grohoski,et al.  Machine Organization of the IBM RISC System/6000 Processor , 1990, IBM J. Res. Dev..

[5]  Mark N. Wegman,et al.  An efficient method of computing static single assignment form , 1989, POPL '89.

[6]  Michael D. Smith,et al.  Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[7]  Kemal Ebcioglu An Efficient Logic Programming Language and Its Application to Music , 1987, ICLP.

[8]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[9]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[10]  Rajiv Gupta,et al.  Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..

[11]  Yale Patt,et al.  Exploiting fine-grained parallelism through a combination of hardware and software techniques , 1991, ISCA '91.

[12]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[13]  Kemal Ebcioglu,et al.  A global resource-constrained parallelization technique , 1989 .

[14]  Alexander Aiken,et al.  A Development Environment for Horizontal Microcode , 1986, IEEE Trans. Software Eng..

[15]  Soo-Mook Moon,et al.  Hardware implementation of a general multi-way jump mechanism , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[16]  Toshio Nakatani,et al.  Using a lookahead window in a compaction-based parallelizing compiler , 1991, SIGM.

[17]  Alexandru Nicolau,et al.  Percolation Scheduling: A Parallel Compilation Technique , 1985 .

[18]  John R. Ellis,et al.  Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific) , 1985 .

[19]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[20]  Toshio Nakatani,et al.  A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .