Loop Detection for Energy-Aware High Performance Embedded Processors

The energy consumed in instruction fetching accounts for a significant portion of total processor energy consumption. Energy consumption as well as performance should be considered when designing high performance embedded processors. In this paper, we present a hardware-based loop detection technique to reduce the energy consumption in the instruction fetch unit (instruction cache and branch prediction logic) for high performance embedded processors. The proposed instruction fetch unit reduces the energy consumed in the instruction cache by replacing the accesses to the large main instruction cache with those to the small selectively accessed cache (SAC). It also reduces the energy consumed in the branch prediction logic by reducing unnecessary accesses to the branch prediction logic. We evaluate the proposed design using a simulation infrastructure based on SimpleScalar and CACTI. Simulation results show that the proposed technique reduces the energy consumption in the instruction cache and the branch prediction logic by 20% and 24% on the average, respectively. Moreover, the proposed scheme shows little performance loss compared to the traditional scheme.

[1]  Vittorio Zaccaria,et al.  Branch prediction techniques for low-power VLIW processors , 2003, GLSVLSI '03.

[2]  Simon Segars Low power design techniques for microprocessors , 2000 .

[3]  Cheol Hong Kim,et al.  An Energy-Efficient Partitioned Instruction Cache Architecture for Embedded Processors , 2006, IEICE Trans. Inf. Syst..

[4]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[5]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[6]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[7]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[8]  Albert Ma,et al.  Way Memoization to Reduce Fetch Energy in Instruction Caches , 2001 .

[9]  Kevin Skadron,et al.  Power issues related to branch prediction , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.