Trace Cache Sampling Filter

This paper presents a new technique for efficient usage of small trace caches. A trace cache can significantly increase the performance of wide out-of-order processors, but to be effective, the size of the trace cache should be large. Power and timing considerations indicate that a small trace cache is desirable, with special mechanisms to increase its effectiveness despite the limited size. Hence several authors have proposed various filtering methods to select "good traces" for keeping in the trace cache, from among the general population of traces. This paper presents a new filtering technique, which is based on sampling. Our new technique suggests that instead of building all the traces and trying to select the good ones among them, it is more efficient to make a preliminary selection of traces. This selection is based on a random sampling approach. We show that the sampling filter improves trace cache and overall system performance, while reducing power dissipation. The sampling filter reduces admission of traces that are not used prior to their eviction from the cache, and prolongs the percentage of time a trace is in its live phase during its stay in the cache. Moreover, the sampling filter reduces duplication between the trace cache and the instruction cache and thus reduces the overall misses in the first level of cache hierarchy.

[1]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[2]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[3]  Mateo Valero,et al.  Trace cache redundancy: red and blue traces , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[4]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[5]  Gary S. Tyson,et al.  Performance Limits of Trace Caches , 1999, J. Instr. Level Parallelism.

[6]  John Paul Shen,et al.  The block-based trace cache , 1999, ISCA.

[7]  Avi Mendelson,et al.  PARROT: Power Awareness Through Selective Dynamically Optimized Traces , 2003, PACS.

[8]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[9]  Sanjay J. Patel,et al.  rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.

[10]  Eric Rotenberg,et al.  A Trace Cache Microarchitecture and Evaluation , 1999, IEEE Trans. Computers.

[11]  Avi Mendelson,et al.  Filtering techniques to improve trace-cache efficiency , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[12]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[14]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[15]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[16]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.