eXtended block cache

This paper describes a new instruction-supply mechanism, called the eXtended Block Cache (XBC). The goal of the XBC is to improve on the Trace Cache (TC) hit rate, while providing the same bandwidth. The improved hit rate is achieved by having the XBC a nearly redundant free structure. The basic unit recorded in the XBC is the extended block (XB), which is a multiple-entry single-exit instruction block. A XB is a sequence of instructions ending on a conditional or an indirect branch. Unconditional direct jumps do not end a XB. In order to enable multiple entry points per XB, the XB index is derived from the IP of its ending instruction. Instructions within the XB are recorded in reverse order, enabling easy extension of XBs. The multiple entry-points remove most of the redundancy. Since there is at most one conditional branch per XB, we can fetch up to n XBs per cycle by predicting n branches. The multiple fetch enables the XBC to march the TC bandwidth.

[1]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[2]  Burzin A. Patel,et al.  Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[3]  Walid A. Najjar,et al.  Design of storage hierarchy in multithreaded architectures , 1995, MICRO 1995.

[4]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[5]  Yale N. Patt,et al.  Alternative fetch and issue policies for the trace cache fetch mechanism , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Ilan Y. Spillinger,et al.  Performance evaluation of a decoded instruction cache for variable instruction-length computers , 1992, ISCA '92.

[8]  Yale N. Patt,et al.  Improving trace cache effectiveness with branch promotion and trace packing , 1998, ISCA.

[9]  John Paul Shen,et al.  The block-based trace cache , 1999, ISCA.

[10]  Glenn Reinman,et al.  A scalable front-end architecture for fast instruction delivery , 1999, ISCA.

[11]  Stéphan Jourdan,et al.  Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[12]  Pascal Sainrat,et al.  Multiple-block ahead branch predictors , 1996, ASPLOS VII.

[13]  S. McFarling Combining Branch Predictors , 1993 .