论文信息 - Exploiting data-width locality to increase superscalar execution bandwidth

Exploiting data-width locality to increase superscalar execution bandwidth

In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit very strong temporal locality and describe mechanisms to accurately predict data-widths. To exploit the predictability of data-widths, we propose a Multi-Bit-Width (MBW) microarchitecture which, when the opportunity arises, takes the wires normally used to route the operands and bypass the result of a 64-bit instruction, and instead uses them for multiple narrow-width instructions. This technique increases the effective issue width without adding many additional wires by reusing, already existing datapaths. Compared to a traditional four-wide superscalar processor our best MBW configuration with a peak issue rate of eight IPC achieves a 7.1% speedup on the simulated SPECint2000 benchmarks, which performs very well when compared to a 7.9% speedup attainable by a processor with a perfect data-width predictor.

Gabriel H. Loh

[1] Ruby B. Lee. Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.

[2] Yale N. Patt,et al. A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[3] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.

[4] Margaret Martonosi,et al. Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance , 2000, TOCS.

[5] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[6] G.S. Sohi,et al. Dynamic Speculation And Synchronization Of Data Dependence , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[7] Todd M. Austin,et al. MASE: a novel infrastructure for detailed microarchitectural modeling , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[8] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.

[9] Anant Agarwal,et al. Virtual wires: overcoming pin limitations in FPGA-based logic emulators , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[10] S. McFarling. Combining Branch Predictors , 1993 .

[11] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .

[12] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, ISCA.

[13] Margaret Martonosi,et al. Selecting a Single, Representative Sample for Accurate Simulation of SPECint Benchmarks , 1999 .

[14] Kenneth C. Yeager,et al. 200-MHz superscalar RISC microprocessor , 1996, IEEE J. Solid State Circuits.

[15] Andreas Moshovos,et al. Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.

[16] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.

[17] Tong Liu,et al. Performance improvement with circuit-level speculation , 2000, MICRO 33.

[18] Hector Sanchez,et al. A 2.2 W, 80 MHz superscalar RISC microprocessor , 1994 .

[19] Todd M. Austin,et al. Efficient dynamic scheduling through tag elimination , 2002, ISCA.

[20] Eric Sprangle,et al. Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[21] Margaret Martonosi,et al. Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[22] Yale N. Patt,et al. The effect of speculatively updating branch history on branch prediction accuracy, revisited , 1994, MICRO 27.

[23] Joseph T. Rahmeh,et al. Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[24] T. Puzak,et al. The optimum pipeline depth for a microprocessor , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[25] Trevor N. Mudge,et al. The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[26] Yale N. Patt,et al. A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.

[27] Andreas Moshovos,et al. Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[28] James E. Smith,et al. A study of branch prediction strategies , 1981, ISCA '98.

[29] Krste Asanovic,et al. Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[30] Pierre Michaud,et al. Trading Conflict And Capacity Aliasing In Conditional Branch Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[31] Norman P. Jouppi,et al. Quantifying the Complexity of Superscalar Processors , 2002 .

[32] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.