Critical path reduction for scalar programs

Scalar performance on processors with instruction level parallelism (ILP) is often limited by control and data dependences. This paper describes a family of compiler techniques, called critical path reduction (CPR) techniques, which reduce the length of critical paths through control and data dependences. Control CPR reduces the number of branches on the critical path and improves the performance of branch intensive codes on processors with inadequate branch throughput or excessive branch latency. Data CPR reduces the number of arithmetic operations on the critical path. Optimization and scheduling are adapted to support CPR.

[1]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[2]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[3]  Alexandru Nicolau,et al.  Percolation Scheduling: A Parallel Compilation Technique , 1985 .

[4]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[5]  Vinod Kathail,et al.  Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism , 1993, LCPC.

[6]  Soha Hassoun,et al.  A 200-MHz 64-bit Dual-Issue CMOS Microprocessor , 1992, Digit. Tech. J..

[7]  James C. Dehnert,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS 1989.

[8]  Scott Mahlke,et al.  Sentinel scheduling: a model for compiler-controlled speculative execution , 1993 .

[9]  B. R. Rau,et al.  Code Generation Schemas for Modulo Scheduled DO-Loops and WHILE-Loops , 1992 .

[10]  Edward S. Davidson,et al.  Highly concurrent scalar processing , 1986, ISCA 1986.

[11]  Nader Bagherzadeh,et al.  Pipelining and Bypassing in a VLIW Processor , 1994, IEEE Trans. Parallel Distributed Syst..

[12]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[13]  Kemal Ebcioglu,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 1992.

[14]  P. Bannon,et al.  Internal architecture of Alpha 21164 microprocessor , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[15]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[16]  Thomas R. Gross,et al.  Postpass Code Optimization of Pipeline Constraints , 1983, TOPL.

[17]  Erich Bloch,et al.  The engineering design of the stretch computer , 1959, IRE-AIEE-ACM '59 (Eastern).

[18]  Kemal Ebcioglu,et al.  A global resource-constrained parallelization technique , 1989 .

[19]  Trevor Mudge,et al.  A microarchitectural performance evaluation of a 3.2 Gbyte/s microprocessor bus , 1993, MICRO 1993.

[20]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[21]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[22]  Joseph A. Fisher,et al.  Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[23]  Nader Bagherzadeh,et al.  VIPER: a VLIW integer microprocessor , 1993 .

[24]  Joseph A. Fisher,et al.  2n-way jump microinstruction hardware and an effective instruction binding method , 1980, SIGM.

[25]  Edward McLellan The Alpha AXP architecture and 21064 processor , 1993, IEEE Micro.

[26]  Vinod Kathail,et al.  Height reduction of control recurrences for ILP processors , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.