Improving context-based load value prediction (instruction-level parallelism)
暂无分享,去创建一个
Microprocessors are becoming faster at such a rapid pace that other components like random access memory cannot keep up. As a result, the latency of load instructions grows constantly and already often impedes processor performance.
Fortunately, load instructions frequently fetch predictable sequences of values. Load value predictors exploit this behavior to predict the results of load instructions. Because the predicted values are available before the memory can deliver the true load values, the CPU is able to speculatively continue processing without having to wait for memory accesses to complete, which improves the execution speed.
The contributions of this dissertation to the area of load value prediction include a novel technique to decrease the number of mispredictions, a predictor design that increases the hardware utilization and thus the number of correctly predicted load values, a detailed analysis of hybrid predictor combinations to determine components that complement each other well, and several approaches to substantially reduce the size of hybrid load value predictors without affecting their performance.
One result of this research is a very small yet high-performing load value predictor. Cycle-accurate simulations of a four-way superscalar microprocessor running SPECint95 show that this predictor outperforms other predictors from the literature by twenty or more percent over a wide range of sizes. With about fifteen kilobytes of state, the smallest examined configuration, it surpasses the speedups delivered by other, five-times larger predictors both with a re-fetch and a re-execute misprediction recovery mechanism.