Dynamically exploiting narrow width operands to improve processor power and performance

In general-purpose microprocessors, recent trends have pushed towards 64 bit word widths, primarily to accommodate the large addressing needs of some programs. Many integer problems, however, rarely need the full 64 bit dynamic range these CPUs provide. In fact, another recent instruction set trend has been increased support for sub-word operations (that is, manipulating data in quantities less than the full word size). In particular, most major processor families have introduced "multimedia" instruction set extensions that operate in parallel on several sub-word quantities in the same ALU. This paper notes that across the SPECint95 benchmarks, over half of the integer operation executions require 16 bits or less. With this as motivation, our work proposes hardware mechanisms that dynamically recognize and capitalize on these "narrow-bitwidth" instances. Both optimizations require little additional hardware, and neither requires compiler support. The first, power-oriented, optimization reduces processor power consumption by using aggressive clock gating to turn off portions of integer arithmetic units that will be unnecessary for narrow bitwidth operations. This optimization results in an over 50% reduction in the integer unit's power consumption for the SPECint95 and MediaBench benchmark suites. The second optimization improves performance by merging together narrow integer operations and allowing them to share a single functional unit. Conceptually akin to a dynamic form of MMX, this optimization offers speedups of 4.3%-6.2% for SPECint95 and 8.0%-10.4% for MediaBench.

[1]  Gary Gibson,et al.  The Metaflow architecture , 1991, IEEE Micro.

[2]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[3]  Michael D. Smith,et al.  A high-performance microarchitecture with hardware-programmable functional units , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Hector Sanchez,et al.  A 2.2 W, 80 MHz superscalar RISC microprocessor , 1994 .

[5]  Marc Tremblay,et al.  The visual instruction set (VIS) in UltraSPARC , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[6]  Bowen Alpern,et al.  Microparallelism and high-performance protein matching , 1995 .

[7]  Doug Hunt,et al.  Advanced performance features of the 64-bit PA-8000 , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[8]  Bowen Alpern,et al.  Microparallelism and High-Performance Protein Matching , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[9]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[10]  John Wawrzynek,et al.  T0: A Single-Chip Vector Microprocessor with Reconfigurable Pipelines , 1996, ESSCIRC '96: Proceedings of the 22nd European Solid-State Circuits Conference.

[11]  Dileep Bhandarkar Alpha implementations and architecture - complete reference and guide , 1996 .

[12]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[13]  P. Ng,et al.  Performance of CMOS differential circuits , 1996 .

[14]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[15]  Mary Jane Irwin,et al.  Transistor sizing for low power CMOS circuits , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[16]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[17]  Wolfgang Fichtner,et al.  Low-power logic styles: CMOS versus pass-transistor logic , 1997, IEEE J. Solid State Circuits.

[18]  Mikko H. Lipasti,et al.  The Performance Potential of Value and Dependence Prediction , 1997, Euro-Par.

[19]  Mike Alexander,et al.  Thermal management system for high performance PowerPC/sup TM/ microprocessors , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[20]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[21]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[22]  Carole Dulong,et al.  The IA-64 Architecture at Work , 1998, Computer.