论文信息 - Trace cache miss tolerance for deeply pipelined superscalar processors

Trace cache miss tolerance for deeply pipelined superscalar processors

The trace cache is a technique that provides accurate, high bandwidth instruction fetch. However, when a desired instruction trace is not found in the cache, conventional instruction fetch and decode must be used to satisfy the trace request. Such auxiliary fetch hardware can be expensive in terms of energy, area and complexity. An approach to combine a trace cache and conventional instruction fetch hardware using a decoupled design is explored. The design enables the processor to dynamically switch between trace ID and PC-based prediction methods and helps to hide the latency associated with the instruction memory path. The decoupled design with accelerated slow path instruction delivery and no instruction cache is able to provide comparable benefit to a front-end with an 8 kB instruction cache (within 2% of the instructions per cycle with the cache). High tolerance can be demonstrated for both trace table misses and increased memory latency when scaling down the size of the trace table and scaling up the L2 access latency.

Glenn Reinman | G. Pitigoi-Aron

[1] Margaret Martonosi,et al. Improving prediction for procedure returns with return-address-stack repair mechanisms , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[2] David R. Kaeli,et al. Branch History Table Prediction of Moving Target Branches due to Subroutine Returns , 1991, ISCA.

[3] D. T. Marr,et al. Hyper-threading technology architecture and microarchitecture : a hyperhtext history , 2002 .

[4] Glenn Reinman,et al. Fetch directed instruction prefetching , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[5] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .

[6] Burzin A. Patel,et al. Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8] Yale N. Patt,et al. A comprehensive instruction fetch mechanism for a processor supporting speculative execution , 1992, MICRO 1992.

[9] Glenn Reinman,et al. Optimizations Enabled by a Decoupled Front-End Architecture , 2001, IEEE Trans. Computers.

[10] Glenn Reinman,et al. A scalable front-end architecture for fast instruction delivery , 1999, ISCA.

[11] Quinn Jacobson,et al. Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12] Tse-Yu Yeh. Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors , 1993 .

[13] Pascal Sainrat,et al. Multiple-block ahead branch predictors , 1996, ASPLOS VII.

[14] Brad Calder,et al. Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[15] James E. Smith,et al. Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[16] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[17] D.R. Kaeli,et al. Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[18] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.

[19] Yale N. Patt,et al. Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[20] Gary S. Tyson,et al. Performance Limits of Trace Caches , 1999, J. Instr. Level Parallelism.

[21] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[22] Yale N. Patt,et al. A comprehensive instruction fetch mechanism for a processor supporting speculative execution , 1992, MICRO 25.