Post-Silicon Validation of Multiprocessor Memory Consistency

Shared-memory chip-multiprocessor (CMP) architectures define memory consistency models that establish the ordering rules for memory operations from multiple threads. Validating the correctness of a CMP's implementation of its memory consistency model requires extensive monitoring and analysis of memory accesses while multiple threads are executing on the CMP. In this paper, we present a low overhead solution for observing, recording and analyzing shared-memory interactions for use in an emulation and/or post-silicon validation environment. Our approach leverages portions of the CMP's own data caches, augmented only by a small amount of hardware logic, to log information relevant to memory accesses. After transferring this information to a central memory location, we deploy our own analysis algorithm to detect any possible memory consistency violations. We build on the property that a violation corresponds to a cycle in an appropriately defined graph representing memory interactions. The solution we propose allows a designer to choose where to run the analysis algorithm: 1) on the CMP itself; 2) on a separate processor residing on the validation platform; or 3) off-line on a separate host machine. Our experimental results show an 83% bug detection rate, in our testbed CMP, over three distinct memory consistency models, namely: relaxed-memory order, total-store order, and sequential consistency. Finally, note that our solution can be disabled in the final product, leading to zero performance overhead and a per-core area overhead that is smaller than the size of a physical integer register file in a modern processor.

[1]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[2]  Albert Meixner,et al.  Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[3]  Satish Narayanasamy,et al.  DRFX: a simple and efficient memory model for concurrent programming languages , 2010, PLDI '10.

[4]  Jade Alglave,et al.  Understanding POWER multiprocessors , 2011, PLDI '11.

[5]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[6]  Stéphan Jourdan,et al.  Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.

[7]  Valeria Bertacco,et al.  Post-silicon verification for cache coherence , 2008, 2008 IEEE International Conference on Computer Design.

[8]  Amitabha Roy,et al.  Fast and Generalized Polynomial Time Memory Consistency Verification , 2006, CAV.

[9]  Francesco Zappa Nardelli,et al.  The semantics of power and ARM multiprocessor machine code , 2009, DAMP '09.

[10]  Arvind,et al.  Memory Model = Instruction Reordering + Store Atomicity , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[11]  Farzan Fallah,et al.  Quick detection of difficult bugs for effective post-silicon validation , 2012, DAC Design Automation Conference 2012.

[12]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[13]  Sridhar Narayanan,et al.  TSOtool: a program for verifying memory systems using the memory consistency model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[14]  Jade Alglave,et al.  Litmus: Running Tests against Hardware , 2011, TACAS.

[15]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[16]  Rajeev Alur,et al.  Generating Litmus Tests for Contrasting Memory Consistency Models , 2010, CAV.

[17]  Valeria Bertacco,et al.  Dacota: Post-silicon validation of the memory subsystem in multi-core designs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[18]  Dennis Shasha,et al.  Efficient and correct execution of parallel programs that share memory , 1988, TOPL.

[19]  Smruti R. Sarangi,et al.  A survey of checker architectures , 2013, CSUR.

[20]  Mikko H. Lipasti,et al.  Constraint graph analysis of multithreaded programs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[21]  Sharad Malik,et al.  Runtime validation of memory ordering using constraint graph checking , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[22]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[23]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .