A new framework for integrated global local scheduling

Global Instruction Schedulers can be classified as either structure or profile driven. Structure driven approaches attempt to find instruction level parallelism by redistributing instructions along all possible execution paths. When resources are limited, poor choices may penalize the frequently executed paths. By contrast, profile driven approaches use feedback information to identify frequently executed (hot) regions, and attempt to improve their performance. This may be at the expense of less frequently executed (cold) regions, for instance by inserting fixup code. The overall performance improves if the frequency information is accurate and there is a dominant trace in the program. If either of these conditions does not hold, performance may degrade. We present a novel algorithm that attempts to combine the individual merits of the above two approaches while avoiding some of their drawbacks. We have also incorporated several techniques which improve the global scheduling performance on out-of-order (OOO) processors. Our algorithm is integrated with a parametric resource model and can be applied both before and after register allocation. It has been implemented in the SGI MIPSpro compiler, and the results have been evaluated on the MIPS R8000 and R10000 processors.