Branch Target Buffer Design and Optimization

A branch target buffer (BTB) can reduce the performance penalty of branches in pipelined processors by predicting the path of the branch and caching information used by the branch. Two major issues in the design of BTBs that achieves maximum performance with a limited number of bits allocated to the BTB implementation are discussed. The first is BTB management. A method for discarding branches from the BTB is examined. This method discards the branch with the smallest expected value for improving performance; it outperforms the least recently used (LRU) strategy by a small margin, at the cost of additional complexity. The second issue is the question of what information to store in the BTB. A BTB entry can consist of one or more of the following: branch tag, prediction information, the branch target address, and instructions at the branch target. Various BTB designs, with one or more of these fields, are evaluated and compared. >

[1]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[2]  Alan Jay Smith,et al.  Cache evaluation and the impact of workload choice , 1985, ISCA '85.

[3]  James R. Goodman,et al.  A study of instruction cache organizations and replacement policies , 1983, ISCA '83.

[4]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[5]  Peter M. Kogge,et al.  The Architecture of Pipelined Computers , 1981 .

[6]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[7]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[8]  B. Ramakrishna Rau,et al.  The effect of instruction fetch strategies upon the performance of pipelined instruction units , 1977, ISCA '77.

[9]  Michael J. Flynn,et al.  Strategies for branch target buffers , 1991, MICRO 24.

[10]  Alan Jay Smith,et al.  The Clipper processor: instruction set architecture and implementation , 1989, CACM.

[11]  D. J. Lalja,et al.  Reducing the branch penalty in pipelined processors , 1988, Computer.

[12]  Mark D. Hill,et al.  Aspects of Cache Memory and Instruction , 1987 .

[13]  J.P. Costello,et al.  Design tradeoffs for a 40 MIPS (peak) CMOS 32-bit microprocessor , 1988, Proceedings 1988 IEEE International Conference on Computer Design: VLSI.

[14]  Henry M. Levy,et al.  An evaluation of branch architectures , 1987, ISCA '87.

[15]  Glenn Hinton 80960-next generation , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[16]  James E. Thomton,et al.  Parallel Operation in the Control Data 6600 , 1899 .

[17]  S. McFarling,et al.  Reducing the cost of branches , 1986, ISCA '86.

[18]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[19]  H. B. Bakoglu,et al.  The IBM RISC System/6000 Processor: Hardware Overview , 1990, IBM J. Res. Dev..

[20]  Peter M. Kogge,et al.  The Architecture of Symbolic Computers , 1990 .

[21]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[22]  D.R. Kaeli,et al.  Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[23]  Yale N. Patt,et al.  Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.

[24]  David R. Stiles,et al.  Pipeline control for a single cycle VLSI implementation of a complex instruction set computer , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[25]  David W. Wall,et al.  Generation and analysis of very long address traces , 1990, ISCA '90.

[26]  Edward S. Davidson,et al.  A multiminiprocessor system implemented through pipelining , 1974, Computer.

[27]  Michael J. Flynn,et al.  Branch Strategies: Modeling and Optimization , 1991, IEEE Trans. Computers.

[28]  Alan Jay Smith,et al.  Line (Block) Size Choice for CPU Cache Memories , 1987, IEEE Transactions on Computers.

[29]  Chris H. Perleberg Branch Target Buffer Design , 1989 .

[30]  Yale N. Patt,et al.  A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[31]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[32]  Harry F. Jordan Performance measurements on HEP - a pipelined MIMD computer , 1983, ISCA '83.

[33]  Thomas R. Gross,et al.  Optimizing delayed branches , 1982, MICRO 15.

[34]  C. V. Ramamoorthy,et al.  Pipeline Architecture , 1977, CSUR.