Better global scheduling using path profiles

Path profiles record the frequencies of execution paths through a program. Until now, the best global instruction schedulers have relied upon profile-gathered frequencies of conditional branch directions to select sequences of basic blocks that only approximate the frequently-executed program paths. The identified sequences are then enlarged using the profile data to improve the scope of scheduling. Finally, the enlarged regions are compacted so that they complete in a small number of cycles. Path profiles remove the need to approximate the frequently-executed paths that are so important to the success of the compaction phase. In this paper, we describe how one can modify a trace-based instruction scheduler and in particular a superblock schedule; to use path profiles in both the selection and enlargement phases of global scheduling. As our experimental results demonstrate, the use of more detailed profile data allows the scheduler to construct superblocks that are more likely to avoid early exits. This effect leads to more useful speculative code motions and an overall improvement in program performance. We also describe how a path-profile based approach can simplify the engineering of a trace-based scheduler by unifying several trace-enlargement heuristics into a single general mechanism.

[1]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[2]  Michael D. Smith,et al.  Limits on multiple instruction issue , 1989, ASPLOS III.

[3]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[4]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[5]  Thomas M. Conte,et al.  The Effect of Code Expanding Optimizations on Instruction Cache Design , 1993, IEEE Trans. Computers.

[6]  Michael D. Smith,et al.  Improving the accuracy of static branch prediction using branch correlation , 1994, ASPLOS VI.

[7]  R. lAIN Efficient context-sensitive pointer analysis for c programs , 1995 .

[8]  Anne Rogers,et al.  The performance impact of incomplete bypassing in processor pipelines , 1995, MICRO 1995.

[9]  Monica S. Lam,et al.  Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.

[10]  Ravi Nair Dynamic path-based branch correlation , 1995, MICRO 1995.

[11]  Michael D. Smith,et al.  Branch Instrumentation in SUIF , 1996 .

[12]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[13]  Todd C. Mowry,et al.  Predicting data cache misses in non-numeric applications through correlation profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[14]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15]  Ravi Nair,et al.  Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups , 1997, ISCA.

[16]  Rajiv Gupta,et al.  Path profile guided partial dead code elimination using predication , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[17]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18]  Michael D. Smith,et al.  Procedure placement using temporal-ordering information , 1999, TOPL.

[19]  Michael D. Smith,et al.  Path-based compilation , 1998 .

[20]  Rajiv Gupta,et al.  Path profile guided partial redundancy elimination using speculation , 1998, Proceedings of the 1998 International Conference on Computer Languages (Cat. No.98CB36225).