Streaming memory consistency for efficient MPSoC design

Multiprocessor systems-on-chip (MPSoC) with distributed shared memory and caches are flexible when it comes to inter-processor communication but require an efficient memory consistency and cache coherency solution. In this paper we present a novel consistency model, streaming consistency, for the streaming domain in which tasks communicate through circular buffers. The model allows more reordering than release consistency and, among other optimizations, enables an efficient software cache coherency solution and posted writes. We also present a software cache coherency implementation and discuss a software circular buffer administration that does not need an atomic read-modify-write instruction. A small experiment demonstrates the potential performance increase of posted writes in MPSoCs with high communication latencies.

[1]  Stamatis Vassiliadis,et al.  Parallel Computer Architecture , 2000, Euro-Par.

[2]  Santanu Dutta,et al.  Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems , 2001, IEEE Des. Test Comput..

[3]  Alain Greiner,et al.  On Cache Coherency and Memory Consistency Issues in NoC Based Shared Memory Multiprocessor SoC Architectures , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[4]  Peter Cumming,et al.  The TI OMAP™ Platform Approach to SOC , 2003 .

[5]  A. Chirila-Rus,et al.  Low-power MPEG-4 video encoder design , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[8]  Luca Benini,et al.  Networks on chips - technology and tools , 2006, The Morgan Kaufmann series in systems on silicon.

[9]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[10]  Rama Chellappa,et al.  An architectural level design methodology for embedded face detection , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[11]  Evert-Jan D. Pol,et al.  Caching Techniques for Multi-Processor Streaming Architectures , 2004 .

[12]  Frank Hofmann,et al.  Digital Radio Mondiale (DRM) digital sound broadcasting in the AM bands , 2003, IEEE Trans. Broadcast..

[13]  Mary K. Vernon,et al.  Comparison of hardware and software cache coherence schemes , 1991, ISCA '91.

[14]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[15]  Om Prakash Gangwal,et al.  A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[16]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[17]  Kees G. W. Goossens,et al.  C-HEAP: A Heterogeneous Multi-Processor Architecture Template and Scalable and Flexible Protocol for the Design of Embedded Signal Processing Systems , 2002, Des. Autom. Embed. Syst..

[18]  HennessyJohn,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991 .

[19]  Grant Martin,et al.  Winning the SoC Revolution , 2003, Springer US.