An Instruction Stream Compression Technique 1

The performance of instruction memory is a critical factor for both large, high performance applications and for embedded systems. With high performance systems, the bandwidth to the instruction cache can be the limiting factor for execution speed. Code density is often the critical factor for embedded systems. In this report we demonstrate a straightforward technique for compressing the instruction stream for programs. After code generation, the instruction stream is analysed for often reused sequences of instructions from within the program’s basic blocks. These patterns of multiple instructions are then mapped into single byte opcodes. This constitutes a compression of multiple, multi-byte operations onto a single byte. When compressed opcodes are detected during the instruction fetch cycle of program execution, they are expanded within the CPU into the original (multi-cycle) set of instructions. Because we only operate within a program’s basic block, branch instructions and their targets are unaffected by this technique. We provide statistics gathered from code generated for the Intel Pentium and the Power PC processors. We have found that incorporating a 1K decode ROM in the CPU we can reduce a program’s code size between 45% and 60%. 1. Material discussed in this technical report is undergoing patent review at the University of Michigan’s Technolog Management Office.