A formal model of data access for multicore architectures with multilevel caches

Abstract The performance of software running on parallel or distributed architectures can be severely affected by the location of data. In shared memory multicore architectures, data movement between caches and main memory is driven by data accesses from tasks executing in parallel on different cores and by a protocol to ensure cache coherence. This paper integrates cache coherence in a formal model of data access, to capture such data movement from an application perspective. We develop an executable model which captures cache coherent data movement between different cache levels and main memory, for software described by task-level data access patterns. The proposed model is generic in the number of cache levels and cores, and abstracts from the concrete communication medium. We show that the model guarantees expected correctness properties for cache coherence, in particular data consistency. This paper further presents a proof-of-concept implementation of the proposed model in rewriting logic, which allows different choices for the underlying hardware architecture of dynamically created parallel data access patterns to be specified and compared at the modelling level.

[1]  Karl Crary,et al.  A Calculus for Relaxed Memory , 2015, POPL.

[2]  Y. N. Srikant,et al.  Exploiting critical data regions to reduce data cache energy consumption , 2014, SCOPES.

[3]  Rajeev Alur,et al.  An Axiomatic Memory Model for POWER Multiprocessors , 2012, CAV.

[4]  Alan J. Hu,et al.  Protocol verification as a hardware design aid , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[5]  Einar Broch Johnsen,et al.  An operational semantics of cache coherent multicore architectures , 2016, SAC.

[6]  Yan Solihin,et al.  Fundamentals of Parallel Multicore Architecture , 2015 .

[7]  José Meseguer,et al.  Conditioned Rewriting Logic as a United Model of Concurrency , 1992, Theor. Comput. Sci..

[8]  John Derrick,et al.  A High-Level Semantics for Program Execution under Total Store Order Memory , 2013, ICTAC.

[9]  Christoph W. Kessler,et al.  VectorPU: A Generic and Efficient Data-container and Component Model for Transparent Data Transfer on GPU-based Heterogeneous Systems , 2017, PARMA-DITAM '17.

[10]  Reiner Hähnle,et al.  ABS: A Core Language for Abstract Behavioral Specification , 2010, FMCO.

[11]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[12]  Christoph W. Kessler,et al.  Ensuring Memory Consistency in Heterogeneous Systems Based on Access Mode Declarations , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).

[13]  Narciso Martí-Oliet,et al.  Model Checking TLR* Guarantee Formulas on Infinite Systems , 2014, Specification, Algebra, and Software.

[14]  Francesco Zappa Nardelli,et al.  x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors , 2010, Commun. ACM.

[15]  Juliane Junker,et al.  Computer Organization And Design The Hardware Software Interface , 2016 .

[16]  Camilo Rocha,et al.  Formal verification of safety properties for a cache coherence protocol , 2015, 2015 10th Computing Colombian Conference (10CCC).

[17]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[18]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[19]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[20]  Einar Broch Johnsen,et al.  Deployment by Construction for Multicore Architectures , 2018, ISoLA.

[21]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[22]  Yun Liang,et al.  Timing analysis of concurrent programs running on shared cache multi-cores , 2009, 2009 30th IEEE Real-Time Systems Symposium.

[23]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[24]  Radha Jagadeesan,et al.  Generative Operational Semantics for Relaxed Memory Models , 2010, ESOP.

[25]  Sebastian Burckhardt,et al.  Effective Program Verification for Relaxed Memory Models , 2008, CAV.

[26]  Michel Dubois,et al.  Verification techniques for cache coherence protocols , 1997, CSUR.

[27]  Frank S. de Boer,et al.  Implementing SOS with Active Objects: A Case Study of a Multicore Memory System , 2019, FASE.

[28]  Giorgio Delzanno,et al.  Constraint-Based Verification of Parameterized Cache Coherence Protocols , 2003, Formal Methods Syst. Des..

[29]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[30]  John Derrick,et al.  Admit Your Weakness: Verifying Correctness on TSO Architectures , 2014, FACS.

[31]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[32]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[33]  Renato Mancuso,et al.  Deterministic Memory Abstraction and Supporting Multicore System Architecture , 2017, ECRTS.

[34]  Donald J. Patterson,et al.  Computer organization and design: the hardware-software interface (appendix a , 1993 .

[35]  Einar Broch Johnsen,et al.  A Formal Model of Parallel Execution on Multicore Architectures with Multilevel Caches , 2017, FACS.

[36]  Jaejin Lee,et al.  Automatic fence insertion for shared memory multiprocessing , 2003, ICS '03.

[37]  Peter Csaba Ölveczky Designing Reliable Distributed Systems , 2017, Undergraduate Topics in Computer Science.

[38]  Wan Fokkink,et al.  Model checking a cache coherence protocol of a Java DSM implementation , 2007, J. Log. Algebraic Methods Program..

[39]  Gordon D. Plotkin,et al.  A structural approach to operational semantics , 2004, J. Log. Algebraic Methods Program..

[40]  Einar Broch Johnsen,et al.  A Maude Framework for Cache Coherent Multicore Architectures , 2016, WRLA.

[41]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[42]  David L. Dill,et al.  Formal specification of abstract memory models , 1993 .

[43]  Jade Alglave,et al.  Understanding POWER multiprocessors , 2011, PLDI '11.