An efficient resource-constrained global scheduling technique for superscalar and VLIW processors

We describe a new algorithm for parallelization of sequential code that eliminates anti and output dependence8 by renaming registers on an as-needed basis during scheduling. A dataflow attribute at the beginning of each basic block indicates what operations are available for moving up through this basic block. Scheduling consists of choosing the best operation from the set of operations that can move to a point, moving the instances of the operation to the point, making bookkeeping copies for edges that join the moving path but are not on it, and updating the dataflow attributes of basic blocks only on the paths that were traversed by the instances of the moved operations. The code motions are done globally without going through atomic transformations of percolation scheduling, for better eficiency. For performing the code motions, we use an intermediate representation that is directly executable as sequential RISC code, rather than VLIW code. As a result, the new algorithm can be used to generate parallelized code for multiple ALU superscalar processors as well. The enhanced pipeline scheduling algorithm for software pipelining of arbitrary code is reformulated within the framework of the new sequential RISC representation. The new algorithm has been implemented, and preliminary results on AIX utilities indicate that it requires significantly less compilation time than the percolation scheduling approach. keywords: Instruction-level parallelism, Compile-time parallelization, VLIW, Superscalar

[1]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[2]  Michael D. Smith,et al.  Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[3]  Alexandru Nicolau,et al.  Percolation Scheduling: A Parallel Compilation Technique , 1985 .

[4]  John R. Ellis,et al.  Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific) , 1985 .

[5]  Y. Patt,et al.  Exploiting fine-grained parallelism through a combination of hardware and software techniques , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[6]  Michael Rodeh,et al.  Global instruction scheduling for superscalar machines , 1991, PLDI '91.

[7]  Toshio Nakatani,et al.  Using a lookahead window in a compaction-based parallelizing compiler , 1991, SIGM.

[8]  Alexander Aiken,et al.  A Development Environment for Horizontal Microcode , 1986, IEEE Trans. Software Eng..

[9]  Kemal Ebcioglu,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 1992.

[10]  Soo-Mook Moon,et al.  Hardware implementation of a general multi-way jump mechanism , 1990, MICRO.

[11]  Toshio Nakatani,et al.  A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .

[12]  Vicki H. Allan,et al.  Software pipelining: a comparison and improvement , 1990, MICRO.

[13]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.