A heuristic for global code motion

Programs that are not loop intensive and which have small basic blocks present a challenge to architectures that rely on instruction level parallelism to deliver high performance. This paper presents a heuristic for moving operations across basic block boundaries based on profile information gathered from previous executions of a program. Architectural support is required for executing these operations speculatively. Performance improvements on a heapsort routine are discussed for a Very Long Instruction Word (VLIW) machine model, using two different sets of operation latencies. Experiments were done both at the assembly code level and on the intermediate representation produced by the compiler. The results show a speedup of about two over the original code after the global code motion heuristic is applied.