Register integration: a simple and efficient implementation of squash reuse

Register integration (or simply integration) is a mechanism for incorporating speculative results directly into a sequential execution using data-dependence relationships. In this paper we use integration to implement squash reuse, the salvaging of instruction results that were needlessly discarded during the course of sequential recovery from a control- or datamis-speculation. To implement integration, we first allow the results of squashed instructions to remain in the physical register file past mis-speculation recovery. As the processor re-traces portions of the squashed path, integration logic examines each instruction as it is being renamed. Using an auxiliary table, this circuit searches the physical register file for the physical register belonging to the corresponding squashed instance of the instruction. If this register is found, integration succeeds and the squashed result is re-validated by a simple update of the rename table. Once integrated, an instruction is complete and may bypass the out-of-order core of the machine entirely. Integration reduces contention for queuing and execution resources, collapses dependent chains of instructions and accelerates the resolution of branches. It achieves this using only rename-table manipulations; no additional values are read from or written to the physical registers. Our preliminary evaluation shows that a minimal integration configuration can provide performance improvements of up to 8% when applied to current-generation micro-architectures and up to 11.5% when applied to more aggressive microarchitectures. Integration also reduces the amount of wasteful speculation in the machine, cutting the number of instructions executed by up to 15% and the number of instructions fetched along mis-speculated paths by as much as 6%.

[1]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[2]  Manoj Franklin,et al.  The multiscalar architecture , 1993 .

[3]  Gurindar S. Sohi,et al.  Pre-execution via speculative data-driven multithreading , 2001 .

[4]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[6]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[7]  John Paul Shen,et al.  Reducing branch misprediction penalties via dynamic control independence detection , 1999, ICS '99.

[8]  P. Glaskowski Pentium 4 (partially) previewed , 2000 .

[9]  Mikko H. Lipasti Value locality and speculative execution , 1998 .

[10]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[11]  ace P2SC IBM’s Power3 to Replace P2SC: 11/17/97 , 1997 .

[12]  Eric Rotenberg,et al.  Control independence in trace processors , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[13]  Stéphan Jourdan,et al.  Speculation techniques for improving load related instruction scheduling , 1999, ISCA.

[14]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[15]  G.S. Sohi,et al.  Dynamic Instruction Reuse , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[16]  Andreas Moshovos,et al.  Memory dependence speculation tradeoffs in centralized, continuous-window superscalar processors , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[17]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.