Meld scheduling: relaxing scheduling constraints across region boundaries

Meld scheduling melds the schedules of neighboring scheduling regions to respect latencies of operations issued in one region but completing after control transfers to the other. In contrast, conventional schedulers ignore latency constraints from other regions leading to potentially avoidable stalls in an interlocked (superscalar) machine or incorrect schedules for non-interlocked (VLIW) machines. Alternatively, schedulers that conservatively require all operations to complete before the branch rakes effect produce inefficient schedules. In this paper, we present general data structures for maintaining latency constraint information at region boundaries. We present a meld scheduling algorithm that generates latency constraints at the boundaries of scheduled regions and utilizes this information during the scheduling of other regions. We present a range of design options and describe the reasons behind our particular choices. We cover certain pitfalls and discuss how to develop an algorithm that addresses these issues. We evaluate the performance of meld scheduling on a range of noninterlocked machine models on a set of SPEC 92 and Unix benchmarks. We also investigate the sensitivity of the performance improvements due to changes in issue width and instruction latencies.