Energy and Performance Trade-offs between Instruction Reuse and Trivial Computations for Embedded Applications

Instruction reuse (IR) and trivial computation (TC) elimination are two architectural techniques that aim at eliminating redundant code to better exploit instruction-level parallelism. While they have been extensively studied in isolation, this paper is the first to compare their relative efficiency. This is done using applications from the embedded domain. This paper establishes the relationship between the two techniques by framing the arithmetic instructions detected by each of them. While TC can only eliminate instructions where one of the operands is zero or one, IR has potentially a wider scope as it can potentially eliminate any instruction given that it has been executed before with the same set of operand values. Despite the wider scope, we have found that IR and TC can eliminate about the same fraction of instructions even if an infinitely large instruction reuse buffer is assumed (IR and TC can eliminate 26% and 22% of the instructions, respectively). Another quite surprising finding is that the two techniques target quite different sets of instructions suggesting that they can provide almost additive gains if combined. In combination, they can eliminate 40% of the instructions they target. In terms of energy-efficiency, we finally find that if an instruction reuse buffer of 256 entries is used, it uses 1% more energy than a processor without IR and TC reduces the energy consumption by 5.6%.

[1]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  Subhasis Banerjee,et al.  Instruction Reuse in SPEC, media and packet processing benchmarks: A comparative study of power, performance and related microarchitectural optimizations , 2006, J. Embed. Comput..

[5]  David J. Lilja,et al.  Improving processor performance by simplifying and bypassing trivial computations , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[6]  Gurindar S. Sohi,et al.  Understanding the differences between value prediction and instruction reuse , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Daniel Citron Revisiting Instruction Level Reuse , .

[8]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[9]  Amirali Baniasadi,et al.  Improving energy-efficiency by bypassing trivial computations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[10]  Per Stenström,et al.  Reduction of Energy Consumption in Processors by Early Detection and Bypassing of Trivial Operations , 2006, 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[11]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[12]  Michael C. Huang,et al.  Dynamically reducing pressure on the physical register file through simple register sharing , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[13]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[14]  Jun Yang,et al.  Frequent Value Locality and Value-Centric Data Cache Design , 2000, ASPLOS.

[15]  Gurindar S. Sohi,et al.  An empirical analysis of instruction repetition , 1998, ASPLOS VIII.

[16]  BurgerDoug,et al.  The SimpleScalar tool set, version 2.0 , 1997 .

[17]  S. E. Richardson Exploiting trivial and redundant computation , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[18]  Antonio González,et al.  Dynamic removal of redundant computations , 1999, ICS '99.

[19]  G.S. Sohi,et al.  Dynamic Instruction Reuse , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[20]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[21]  Gurindar S. Sohi,et al.  Exploiting value locality in physical register files , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[22]  Paul D. Franzon,et al.  Low power data processing by elimination of redundant computations , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[23]  S. Richardson Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .