Analyzing Effects of Trace Cache Configurations on the Prediction of Indirect Branches

This paper discusses the effects of using a trace cache on the indirect branch prediction in ILP processors. The main contribution of the paper is an exploration of the fact that the trace cache captures context information about the recent control flow of the program, which can improve the accuracy of predictors that do not themselves explicitly use such information. We analyze and experiment with various trace cache configurations and strategies to measure their effects on indirect branch prediction accuracy. We show that updating indirect branch target addresses in the trace cache improves indirect branch prediction accuracy. Then, we incrementally vary the trace cache configuration such as applying trace packing, adding 2-bit update counters per trace cache line, varying trace cache set associativity, cache size and cache line size in order to observe the impact of each configuration on the indirect branch prediction. We simulate a wide variety of designs using benchmarks with higher than average numbers of indirect branches. Our experimental results show that the harmnonic mean indirect branch prediction accuracy for a processor model with a trace cache that updates indirect branch target addresses is 42.04%, compared to 28.82% for a model with a trace cache that does not update indirect branch target addresses, and 10.85% for a model with a branch target buffer on our benchmarks. Our results have implications for any hardware predictor which stores entries corresponding to (possibly replicated) instructions in the trace cache rather than original instructions in main memory.

[1]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[2]  David Gregg,et al.  The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures , 2001, Euro-Par.

[3]  Yale N. Patt,et al.  Trace cache design for wide-issue superscalar processors , 1999 .

[4]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[5]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Dirk Grunwald,et al.  Reducing indirect function call overhead in C++ programs , 1994, POPL '94.

[7]  David R. Kaeli,et al.  Predicting indirect branches via data compression , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  Yul Chu,et al.  An Efficient Indirect Branch Predictor , 2001, Euro-Par.

[9]  Philippe Roussel,et al.  The microarchitecture of the intel pentium 4 processor on 90nm technology , 2004 .

[10]  Mateo Valero,et al.  Software Trace Cache , 2014, IEEE Transactions on Computers.

[11]  Karel Driesen,et al.  Multi-stage Cascaded Prediction , 1999, Euro-Par.

[12]  Sang Jeong Lee,et al.  Decoupled value prediction on trace processors , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[13]  David J. Lilja,et al.  Measuring computer performance : A practitioner's guide , 2000 .

[14]  Eric Rotenberg,et al.  A Trace Cache Microarchitecture and Evaluation , 1999, IEEE Trans. Computers.

[15]  Karel Driesen,et al.  Accurate indirect branch prediction , 1998, ISCA.

[16]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[17]  Mateo Valero,et al.  A low-complexity fetch architecture for high-performance superscalar processors , 2004, TACO.

[18]  Yale N. Patt,et al.  Target prediction for indirect jumps , 1997, ISCA '97.

[19]  R. D. Valentine,et al.  The Intel Pentium M processor: Microarchitecture and performance , 2003 .

[20]  Mateo Valero,et al.  A Comprehensive Analysis of Indirect Branch Prediction , 2002, ISHPC.