Branch merging for scheduling concurrent executions of branch operations

Branches are a major limiting factor to instruction-level parallelism. One solution is to execute several branches simultaneously using multiway branching architectures. Such architectures are especially important when the instruction issue width becomes large. The authors study the problem of compile-time scheduling of branch operations on such architectures: an optimisation called branch merging. The scheduling attempts to bring profitable branches together for concurrent execution. It is shown that finding the optimal solution to the branch merging problem is NP-hard. A heuristic is then proposed, which relies on a cost model to direct the merging of branches and their associated basic blocks. Merged branches are then scheduled together for concurrent execution. The authors used simulation to evaluate the effectiveness of the proposed algorithm. Experiments on selected benchmark programs show that the heuristic achieves roughly a 10% performance improvement on multiway branching architectures.

[1]  Joseph A. Fisher,et al.  2n-way jump microinstruction hardware and an effective instruction binding method , 1980, SIGM.

[2]  Michael D. Smith,et al.  Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[3]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Wolfgang Karl,et al.  Some Design Aspects for VLIW Architectures Exploiting Fine - Grained Parallelism , 1993, PARLE.

[6]  Alexandru Nicolau,et al.  Efficient hardware for multiway jumps and pre-fetches , 1985, MICRO 18.

[7]  Alexandru Nicolau,et al.  Percolation Scheduling: A Parallel Compilation Technique , 1985 .

[8]  Chung-Ta King,et al.  Branch merging for effective exploitation of instruction-level parallelism , 1992, MICRO.

[9]  Alexandru Nicolau,et al.  A Percolation Based VLIW Architecture , 1991, ICPP.

[10]  Soo-Mook Moon,et al.  Generalized Multiway Branch Unit for VLIW Microprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[11]  Chung-Ta King,et al.  Branch merging for effective exploitation of instruction-level parallelism , 1992, MICRO 1992.

[12]  Michael D. Smith,et al.  Boosting beyond static scheduling in a superscalar processor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[13]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[14]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[15]  Yen-Jen Oyang,et al.  The effect of employing advanced branching mechanisms in superscalar processors , 1990, CARN.