Critical Issues Regarding the Trace Cache Fetch Mechanism

In order to meet the demands of wider issue processors, fetch mechanisms will need to fetch multiple basic blocks per cycle. The trace cache supplies several basic blocks each cycle by storing logically contiguous instructions in physically contiguous storage. When a particular basic block is requested, the trace cache can potentially respond with the requested block along with several blocks that followed it when the block was last encountered. In this technical report, we examine some critical features of a trace cache mechanism designed for a 16-wide issue processor and evaluate their eeects on performance. We examine features such as cache associativity, storage partitioning, branch predictor design, instruction cache design, and ll unit design. We compare the performance of our trace cache mechanism with that of the design presented by Rotenberg et al 19] and show a 23% improvement in performance. In our nal analysis, we compare our trace cache mechanism with an aggressive single basic block fetch mechanism and show that the trace cache mechanism attains a 24% improvement in performance.

[1]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[2]  Nader Bagherzadeh,et al.  Multiple branch and block prediction , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[3]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[4]  Y. Patt,et al.  Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[5]  Rachel Yung Design decisions influencing the UltraSPARC's instruction fetch architecture , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[6]  Yale N. Patt,et al.  Improving branch prediction accuracy by reducing pattern history table interference , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[7]  Pascal Sainrat,et al.  Multiple-block ahead branch predictors , 1996, ASPLOS VII.

[8]  M. Franklin,et al.  Control flow prediction with tree-like subgraphs for superscalar processors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[9]  M. Franklin,et al.  Improving CISC instruction decoding performance using a fill unit , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[10]  Burzin A. Patel,et al.  Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[11]  Yale N. Patt,et al.  Facilitating superscalar processing via a combined static/dynamic register renaming scheme , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[12]  Manoj Franklin,et al.  A fill-unit approach to multiple instruction issue , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Yale N. Patt,et al.  Branch Classification: A New Mechanism for Improving Branch Predictor Performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[15]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[16]  S. McFarling Combining Branch Predictors , 1993 .

[17]  D.R. Kaeli,et al.  Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[18]  Yale N. Patt,et al.  Performance benefits of large execution atomic units in dynamically scheduled machines , 1989, ICS '89.

[19]  Yale N. Patt,et al.  Hardware Support For Large Atomic Units in Dynamically Scheduled Machines , 1988, [1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21.

[20]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[21]  Yale N. Patt,et al.  HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.

[22]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.