Improving processor performance by simplifying and bypassing trivial computations

During the course of a program's execution, a processor performs mangy trivial computations; that is, computations that can be simplified or where the result is zero, one, or equal to one of the input operands. This paper shows that, despite compiling a program with aggressive optimizations (-O3), approximately 30% of all arithmetic instructions, which account for 12% of all dynamic instructions, are trivial computations. The amount of trivial computation is not heavily dependent on the program's specific input values. Our results show that eliminating trivial computations dynamically at run-time yields an average speedup of 8% for a typical processor. Even for a very aggressive processor (i.e. one with no functional unit constraints), the average speedup is still 6%. It also is important to note that the area cost (i.e. hardware) required to dynamically detect and eliminate these trivial computations is very low, consisting of only a few comparators and multiplexers.

[1]  S. Richardson Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .

[2]  Jian Huang,et al.  Exploiting basic block value locality with block reuse , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[3]  Larry Rudolph,et al.  Accelerating multi-media processing by implementing memoing in multiplication and division units , 1998, ASPLOS VIII.

[4]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[5]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[6]  Jordi Tubella,et al.  The Performance Potential of Data Value Reuse , 1998 .

[7]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  Michael J. Flynn,et al.  ON DIVISION AND RECIPROCAL CACHES , 1995 .

[9]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[10]  Antonio González,et al.  Dynamic removal of redundant computations , 1999, ICS '99.

[11]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[12]  Proceedings Eighth International Symposium on High Performance Computer Architecture , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[13]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[14]  John Flynn,et al.  Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research , 2001 .