Branch folding in the CRISP microprocessor: reducing branch delay to zero

A new method of implementing branch instructions is presented. This technique has been implemented in the CRISP Microprocessor. With a combination of hardware and software techniques the execution time cost for many branches can be effectively reduced to zero. Branches are folded into other instructions, making their execution as separate instructions unnecessary. Branch Folding can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as reducing or eliminating pipeline breakage. Statistics are presented demonstrating the effectiveness of Branch Folding and associated techniques used in the CRISP Microprocessor.

[1]  Peter M. Kogge,et al.  The Architecture of Pipelined Computers , 1981 .

[2]  Hubert Rae McLellan Instruction prefetch strategies in a pipelined processor , 1983 .

[3]  Chris Rowen,et al.  A CMOS RISC Processor with Integrated System Functions , 1986, COMPCON.

[4]  Henry M. Levy,et al.  Measurement and analysis of instruction use in the VAX-11/780 , 1982, ISCA 1982.

[5]  Roland N. Ibbett,et al.  The MU5 Computer System , 1979 .

[6]  Emmanuel Katevenis,et al.  Reduced instruction set computer architectures for VLSI , 1984 .

[7]  Roland N. Ibbett,et al.  An Analysis of Instruction-Fetching Strategies in Pipelined Computers , 1980, IEEE Transactions on Computers.

[8]  Robert B. Murray,et al.  Compiling for the CRISP Microprocessor , 1987, COMPCON.

[9]  Norman P. Jouppi,et al.  Hardware/software tradeoffs for increased performance , 1982, ASPLOS I.

[10]  Cheryl A. Wiecek,et al.  A case study of VAX-11 instruction set usage for compiler execution , 1982, ASPLOS I.

[11]  William S. Worley,et al.  Beyond RISC: High-Precision Architecture , 1986, COMPCON.

[12]  Robert G. Wedig,et al.  The reduction of branch instruction execution overhead using structured control flow , 1984, ISCA '84.

[13]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[14]  Robert D. Russell,et al.  The PDP-11: A case study of how not to design condition codes , 1978, ISCA '78.

[15]  David R. Ditzel,et al.  The hardware architecture of the CRISP microprocessor , 1987, ISCA '87.

[16]  Norman P. Jouppi,et al.  MIPS: a VLSI processor architecture , 1981 .

[17]  Leonard Jay Shustek,et al.  Analysis and performance of computer instruction sets , 1978 .

[18]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[19]  Werner Buchholz,et al.  Planning a Computer System: Project Stretch , 1962 .

[20]  David R. Ditzel,et al.  Introduction to the CRISP Instruction Set Architecture , 1987, COMPCON.

[21]  George Radin,et al.  The 801 minicomputer , 1982, ASPLOS I.

[22]  M. Shoji,et al.  A pipelined 32b microprocessor with 13Kb of cache memory , 1987, 1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.