论文信息 - A group-commit mechanism for ROB-based processors implementing the X86 ISA

A group-commit mechanism for ROB-based processors implementing the X86 ISA

We introduce an alternative instruction commitment mechanism for a Reorder Buffer (ROB)-based out-of-order processor that commits a group of consecutive instructions atomically to support a larger instruction window. The proposed mechanism makes conservative use of the ROB, by only setting up entries for the instructions that perform the latest update to a register from that group. Further, the destination registers of instructions from a group that do not hold the most recent updates to architectural registers, can be released before the group containing these instructions is committed. The net result is an augmented ROB-based datapath, which increases the effective size of the ROB as well as the effective number of physical registers. The proposed design achieves an average performance gain of about 10% and 16% on the SPEC integer and floating point benchmarks, respectively, when compared to a traditional ROB-based design. The proposed design also achieves a performance gain of slightly over 5% when compared with an aggressive design that uses checkpoints and relatively complex hardware resources.

Hui Zeng | Furat Afram | Kanad Ghose

[1] Haitham Akkary,et al. Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.

[2] Francisco J. Cazorla,et al. Kilo-instruction processors: overcoming the memory wall , 2005, IEEE Micro.

[3] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[4] J.F. Martinez,et al. Cherry: Checkpointed early resource recycling in out-of-order microprocessors , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[5] Mateo Valero,et al. A distributed processor state management architecture for large-window processors , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[6] Josep Llosa,et al. Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[7] Andreas Moshovos,et al. Turbo-ROB: A Low Cost Checkpoint/Restore Accelerator , 2008, HiPEAC.

[8] Yale N. Patt,et al. Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.

[9] Mikko H. Lipasti,et al. Deconstructing commit , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[10] Eduardo Quiñones,et al. Early Register Release for Out-of-Order Processors with RegisterWindows , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[11] Andreas Moshovos,et al. BranchTap: improving performance with very few checkpoints through adaptive speculation control , 2006, ICS '06.

[12] Kanad Ghose,et al. Increasing processor performance through early register release , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[13] Michael C. Huang,et al. Cherry: checkpointed early resource recycling in out-of-order microprocessors , 2002, MICRO.

[14] Shunfei Chen,et al. MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[15] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[16] Gürhan Küçük,et al. Complexity-effective reorder buffer designs for superscalar processors , 2004, IEEE Transactions on Computers.

[17] Michael F. P. O'Boyle,et al. Compiler directed early register release , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[18] Eric Rotenberg,et al. Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[19] José González,et al. CROB: Implementing a Large Instruction Window through Compression , 2011, Trans. High Perform. Embed. Archit. Compil..

[20] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .

[21] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22] Yale N. Patt,et al. Facilitating superscalar processing via a combined static/dynamic register renaming scheme , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[23] Andreas Moshovos. Checkpointing alternatives for high performance, power-aware processors , 2003, ISLPED '03.

[24] Lieven Eeckhout,et al. Investigating the implementation of a block structured processor architecture in an early design stage , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.