Code optimization of pipeline constraints

A pipelined processor divides the execution of an instruction over several independent units, called pipestages. Multiple instructions are executed concurrently; each instruction occupies a different pipestage. This organization creates hazards if an instruction needs the result computed by an earlier instruction that is still executing. We distinguish between timing hazards, which result from data dependencies in the instruction stream, and sequencing hazards, which arise from changes of the control flow. A pipeline interlock mechanism is used in a pipelined architecture to prevent the execution of a machine instruction whenever a hazard is present. The interlock mechanism slows down the execution of programs, but it is needed since the instruction stream is not tailored for execution on the pipelined processor. An alternative to this complex piece of hardware is to rearrange the instructions at compile-time to avoid pipeline interlocks. If no useful arrangement can be found, the compiler has to insert no-ops to prevent illegal execution sequences. For timing hazards, the compiler separates data dependent instructions so that no pipeline conflicts are present. We investigate two approaches to remove timing hazards at compile time. First, we discuss postpass reorganization. A conventional compiler generates code without concern for pipeline constraints, and this sequence of machine instructions is reorganized by a separate phase of the compiling system. The basic problem of reorganization of machine level instructions at compile-time is shown to be NP-complete. A heuristic algorithm is proposed and its properties and effectiveness are explored. Then we investigate a second approach that combines register allocation and instruction scheduling. However, the results achievable by this approach are inferior to postpass optimization. Delayed branches remove sequencing hazards but place restrictions on the instructions that follow directly after the branch instruction. The successor instructions are always executed, whether the branch is taken or not, and the compiler must guarantee that only innocous instructions are executed. We present an algorithm that selects the most promising instructions based upon a static prediction of the behaviour of the branch. Code optimization yields a dynamic improvement of about 25% when compared with the default solution, the insertion of no-ops. Furthermore, it allows the implementation of the processor with substantially less hardware complexity and faciliates a faster basic cycle time. We conclude that hazard removal at compile time is very attractive. Throughout this thesis, we give empirical data from MIPS, a VLSI processor design for a streamlined instruction set processor that was developed at Stanford University.