Superthreading: a multithreaded approach to microprocessor design

Current superscalar processors attempt to achieve increased instruction level parallelism by increasing the number of functional units. An analytical model is used to show that this technique faces diminishing returns and that high single stream instruction parallelism may be difficult to achieve. It has been suggested that following multiple flows of control and executing instructions speculatively will yield higher performance. This work investigates the former approach. An architecture which simultaneously executes instructions from multiple threads on multiple execution units is presented. This approach is called "superthreading". The architecture provides direct hardware support for a set of threads. A unit called the thread handler is responsible for each thread. Instructions are dispatched from thread handlers to execution units through a crossbar like circuit. Several threads from different processes may be simultaneously active at any given time. The mechanisms for instruction fetch, synchronization and some operating system issues are discussed. An implementation of the architecture can be developed to be binary compatible with an existing ISA. An analytical model of the proposed architecture is also developed. This includes an exact model for a crossbar under the dependent reference model. The crossbar model allows different request probabilities for different output ports. It also allows requests to be targeted to any one of a type of output, rather than a specific one. The architecture is modeled for different numbers of thread handlers and execution units. The effect of several other parameters on performance is also studied. A baseline system with just four execution units achieves an average execution rate of 3 instructions per cycle. Performance is also affected very little by branches and cache misses.