A group-commit mechanism for ROB-based processors implementing the X86 ISA

We introduce an alternative instruction commitment mechanism for a Reorder Buffer (ROB)-based out-of-order processor that commits a group of consecutive instructions atomically to support a larger instruction window. The proposed mechanism makes conservative use of the ROB, by only setting up entries for the instructions that perform the latest update to a register from that group. Further, the destination registers of instructions from a group that do not hold the most recent updates to architectural registers, can be released before the group containing these instructions is committed. The net result is an augmented ROB-based datapath, which increases the effective size of the ROB as well as the effective number of physical registers. The proposed design achieves an average performance gain of about 10% and 16% on the SPEC integer and floating point benchmarks, respectively, when compared to a traditional ROB-based design. The proposed design also achieves a performance gain of slightly over 5% when compared with an aggressive design that uses checkpoints and relatively complex hardware resources.

[1]  Haitham Akkary,et al.  Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.

[2]  Francisco J. Cazorla,et al.  Kilo-instruction processors: overcoming the memory wall , 2005, IEEE Micro.

[3]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[4]  J.F. Martinez,et al.  Cherry: Checkpointed early resource recycling in out-of-order microprocessors , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[5]  Mateo Valero,et al.  A distributed processor state management architecture for large-window processors , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[6]  Josep Llosa,et al.  Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[7]  Andreas Moshovos,et al.  Turbo-ROB: A Low Cost Checkpoint/Restore Accelerator , 2008, HiPEAC.

[8]  Yale N. Patt,et al.  Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.

[9]  Mikko H. Lipasti,et al.  Deconstructing commit , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[10]  Eduardo Quiñones,et al.  Early Register Release for Out-of-Order Processors with RegisterWindows , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[11]  Andreas Moshovos,et al.  BranchTap: improving performance with very few checkpoints through adaptive speculation control , 2006, ICS '06.

[12]  Kanad Ghose,et al.  Increasing processor performance through early register release , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[13]  Michael C. Huang,et al.  Cherry: checkpointed early resource recycling in out-of-order microprocessors , 2002, MICRO.

[14]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[15]  Andrew R. Pleszkun,et al.  Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[16]  Gürhan Küçük,et al.  Complexity-effective reorder buffer designs for superscalar processors , 2004, IEEE Transactions on Computers.

[17]  Michael F. P. O'Boyle,et al.  Compiler directed early register release , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[18]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[19]  José González,et al.  CROB: Implementing a Large Instruction Window through Compression , 2011, Trans. High Perform. Embed. Archit. Compil..

[20]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[21]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Yale N. Patt,et al.  Facilitating superscalar processing via a combined static/dynamic register renaming scheme , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Andreas Moshovos Checkpointing alternatives for high performance, power-aware processors , 2003, ISLPED '03.

[24]  Lieven Eeckhout,et al.  Investigating the implementation of a block structured processor architecture in an early design stage , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.