Area-Aware Optimizations for Resource Constrained Branch Predictors Exploited in Embedded Processors

Modern embedded processors (e.g., Intel's XScale) use small and simple branch predictors to improve performance. Such predictors impose little area and power overhead but may offer low accuracy. As a result, branch misprediction rate could be high. Such mispredictions result in longer program runtime and wasted activity. To address this inefficiency, we introduce two optimization techniques: first, we introduce an adaptive and low-complexity branch prediction technique. Our branch predictor removes up to a maximum of 50% of the branch mispredictions of a bimodal predictor. This results in improving performance by up to 16%. Second, we present front-end gating techniques and reduce wasted activity up to a maximum of 32%