Software profiling for hot path prediction: less is more

Recently, there has been a growing interest in exploiting profile information in adaptive systems such as just-in-time compilers, dynamic optimizers and, binary translators. In this paper, we show that sophisticated software profiling schemes that provide highly accurate information in an offline setting are ill-suited for these dynamic code generation systems. We experimentally demonstrate that hot path predictions must be made early in order to control the rising cost of missed opportunity that result from the prediction delay. We also show that existing sophisticated path profiling schemes, if used in an online setting, offer no prediction advantages over simpler schemes that exhibit much lower runtime overheads.Based on these observation we developed a new low-overhead software profiling scheme for hot path prediction. Using an abstract metric we compare our scheme to path profile based prediction and show that our scheme achieves comparable prediction quality. In our second set of experiments we include runtime overhead and evaluate the performance of our scheme in a realistic application: Dynamo, a dynamic optimization system. The results show that our prediction scheme clearly outperforms path profile based prediction and thus confirm that less profiling as exhibited in our scheme will actually lead to more effective hot path prediction.

[1]  N. S. Barnett,et al.  Private communication , 1969 .

[2]  S. McFarling,et al.  Reducing the cost of branches , 1986, ISCA '86.

[3]  Scott A. Mahlke,et al.  Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[4]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[5]  Bjørn N. Freeman-Benson,et al.  Multi‐way versus one‐way constraints in user interfaces: Experience with the deltablue algorithm , 1993, Softw. Pract. Exp..

[6]  Yale N. Patt,et al.  A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.

[7]  Dirk Grunwald,et al.  Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.

[8]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[9]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[10]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[11]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[12]  Zheng Wang,et al.  System support for automatic profiling and optimization , 1997, SOSP.

[13]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[14]  Thomas Ball,et al.  Edge profiling versus path profiling: the showdown , 1998, POPL '98.

[15]  Michael Gschwind,et al.  Execution-Based Scheduling for VLIW Architectures , 1999, Euro-Par.

[16]  Erik R. Altman,et al.  BOA: Targeting Multi-Gigahertz with Binary Translation , 1999 .

[17]  Vasanth Bala,et al.  Transparent Dynamic Optimization: The Design and Implementation of Dynamo , 1999 .

[18]  Michael D. Smith,et al.  Static correlated branch prediction , 1999, TOPL.

[19]  M. Merten,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).