A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions

Out-of-order retirement of instructions has been shown to be an effective technique to increase the number of in-flight instructions. This form of runtime scheduling can reduce pipeline stalls caused by head-of-line blocking effects in the reorder buffer (ROB). Expanding the width of the instruction window can be highly beneficial to multiprocessors that implement a strict memory model, especially when both loads and stores encounter long latencies due to cache misses, and whose stalls must be overlapped with instruction execution to overcome the memory latencies. Based on the Validation Buffer (VB) architecture (a previously proposed out-of-order retirement, checkpoint-free architecture for single processors), this paper proposes a cost-effective, scalable, out-of-order retirement multiprocessor, capable of enforcing sequential consistency without impacting the design of the memory hierarchy or interconnect. Our simulation results indicate that utilizing a VB can speed up both relaxed and sequentially consistent in-order retirement in future multiprocessor systems by between 3 and 20 percent, depending on the ROB size.

[1]  Alvin M. Despain,et al.  The 16-fold way: a microparallel taxonomy , 1993, MICRO 1993.

[2]  Babak Falsafi,et al.  Speculative sequential consistency with little custom storage , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[3]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[4]  T. N. Vijaykumar,et al.  Is SC + ILP = RC? , 1999, ISCA.

[5]  Mikko H. Lipasti,et al.  Deconstructing commit , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[6]  Hewlett-Packard THE HP PA-8000 RISC CPU , 2022 .

[7]  Sarita V. Adve,et al.  Designing memory consistency models for shared-memory multiprocessors , 1993 .

[8]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[9]  Anoop Gupta,et al.  Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[10]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[11]  Josep Llosa,et al.  Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[12]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[13]  Sarita V. Adve,et al.  Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models , 1997, SPAA '97.

[14]  Haitham Akkary,et al.  Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.

[15]  Mateo Valero,et al.  Implementing Kilo-Instruction Multiprocessors , 2005, ICPS '05. Proceedings. International Conference on Pervasive Services, 2005..

[16]  Denis Trystram,et al.  Parallel algorithms and architectures , 1995 .

[17]  José F. Martínez,et al.  Cherry-MP: correctly integrating checkpointed early resource recycling in chip multiprocessors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[18]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[19]  Stamatis Vassiliadis,et al.  Register renaming and dynamic speculation: an alternative approach , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[20]  Josep Torrellas,et al.  The Bulk Multicore architecture for improved programmability , 2009, Commun. ACM.

[21]  S.P. Marti,et al.  A Complexity-Effective Out-of-Order Retirement Microarchitecture , 2009, IEEE Transactions on Computers.

[22]  Michel Dubois,et al.  Correct memory operation of cache-based multiprocessors , 1987, ISCA '87.