Deconstructing commit

Many modern processors execute instructions out of their original program order to exploit instruction-level parallelism and achieve higher performance. However even though instructions can execute in an arbitrary order, they must eventually commit, or retire from execution, in program order. This constraint provides a safety mechanism to ensure that mis-speculated instructions are not inadvertently committed, but can consume valuable processor resources and severely limit the degree of parallelism exposed in a program. We assert that such a constraint is overly conservative, and propose conditions under which it can be relaxed. This paper deconstructs the notion of commit in an out-of-order processor, and examines the set of necessary conditions under which instructions can be permitted to retire out of program order. It provides a detailed analysis of the frequency and relative importance of these conditions, and discusses microarchitectural modifications that relax the in-order commit requirement. Overall, we found that for a given set of processor resources our technique achieves speedups of up to 68% and 8% for floating point and integer benchmarks, respectively. Conversely, because out-of-order commit allows more efficient utilization of cycle-time limiting resources, it can alternatively enable simpler designs with potentially higher clock frequencies.

[1]  Trevor N. Mudge,et al.  Integrating superscalar processor components to implement register caching , 2001, ICS '01.

[2]  Dean M. Tullsen,et al.  Software-Directed Register Deallocation for Simultaneous Multithreaded Processors , 1999, IEEE Trans. Parallel Distributed Syst..

[3]  Stamatis Vassiliadis,et al.  Register renaming and dynamic speculation: an alternative approach , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[4]  Tong Li,et al.  A large, fast instruction window for tolerating cache misses , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[5]  Josep Llosa,et al.  Large virtual robs by processor checkpointing , 2002 .

[6]  Guang R. Gao,et al.  An investigation of the performance of various instruction-issue buffer topologies , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[7]  Alvin M. Despain,et al.  The 16-fold way: a microparallel taxonomy , 1993, MICRO 1993.

[8]  Rajeev Balasubramonian,et al.  Dynamically allocating processor resources between nearby and distant ILP , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[9]  Stéphan Jourdan,et al.  A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  Andrew R. Pleszkun,et al.  Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[11]  Haitham Akkary,et al.  Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.

[12]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[13]  Yale N. Patt,et al.  Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.

[14]  Norman P. Jouppi,et al.  Register file design considerations in dynamically scheduled processors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[15]  Nader Bagherzadeh,et al.  A scalable register file architecture for dynamically scheduled processors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[16]  J. Zalamea,et al.  Two-level hierarchical register file organization for VLIW processors , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[17]  Cathy May,et al.  The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .

[18]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[19]  Mikko H. Lipasti,et al.  Half-price architecture , 2003, ISCA '03.

[20]  J.F. Martinez,et al.  Cherry: Checkpointed early resource recycling in out-of-order microprocessors , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[21]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[22]  Yale N. Patt,et al.  Hierarchical registers for scientific computers , 1988, ICS '88.

[23]  Milo M. K. Martin,et al.  Exploiting dead value information , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[24]  Joel S. Emer,et al.  Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[25]  Mateo Valero,et al.  Virtual-physical registers , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.