Exploiting data-width locality to increase superscalar execution bandwidth

In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit very strong temporal locality and describe mechanisms to accurately predict data-widths. To exploit the predictability of data-widths, we propose a Multi-Bit-Width (MBW) microarchitecture which, when the opportunity arises, takes the wires normally used to route the operands and bypass the result of a 64-bit instruction, and instead uses them for multiple narrow-width instructions. This technique increases the effective issue width without adding many additional wires by reusing, already existing datapaths. Compared to a traditional four-wide superscalar processor our best MBW configuration with a peak issue rate of eight IPC achieves a 7.1% speedup on the simulated SPECint2000 benchmarks, which performs very well when compared to a 7.9% speedup attainable by a processor with a perfect data-width predictor.

[1]  Ruby B. Lee Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.

[2]  Yale N. Patt,et al.  A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[3]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[4]  Margaret Martonosi,et al.  Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance , 2000, TOCS.

[5]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[6]  G.S. Sohi,et al.  Dynamic Speculation And Synchronization Of Data Dependence , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[7]  Todd M. Austin,et al.  MASE: a novel infrastructure for detailed microarchitectural modeling , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[8]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[9]  Anant Agarwal,et al.  Virtual wires: overcoming pin limitations in FPGA-based logic emulators , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[10]  S. McFarling Combining Branch Predictors , 1993 .

[11]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[12]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, ISCA.

[13]  Margaret Martonosi,et al.  Selecting a Single, Representative Sample for Accurate Simulation of SPECint Benchmarks , 1999 .

[14]  Kenneth C. Yeager,et al.  200-MHz superscalar RISC microprocessor , 1996, IEEE J. Solid State Circuits.

[15]  Andreas Moshovos,et al.  Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.

[16]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[17]  Tong Liu,et al.  Performance improvement with circuit-level speculation , 2000, MICRO 33.

[18]  Hector Sanchez,et al.  A 2.2 W, 80 MHz superscalar RISC microprocessor , 1994 .

[19]  Todd M. Austin,et al.  Efficient dynamic scheduling through tag elimination , 2002, ISCA.

[20]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[21]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[22]  Yale N. Patt,et al.  The effect of speculatively updating branch history on branch prediction accuracy, revisited , 1994, MICRO 27.

[23]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[24]  T. Puzak,et al.  The optimum pipeline depth for a microprocessor , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[25]  Trevor N. Mudge,et al.  The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[26]  Yale N. Patt,et al.  A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.

[27]  Andreas Moshovos,et al.  Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[28]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[29]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[30]  Pierre Michaud,et al.  Trading Conflict And Capacity Aliasing In Conditional Branch Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[31]  Norman P. Jouppi,et al.  Quantifying the Complexity of Superscalar Processors , 2002 .

[32]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.