Streaming consistency: a model for efficient MPSoC design

Multiprocessor systems-on-chip (MPSoC) with distributed shared memory and caches are flexible when it comes to inter-processor communication but require an efficient memory consistency and cache coherency solution. In this paper we present a novel consistency model, streaming consistency, for the streaming domain in which tasks communicate through circular buffers. The model allows more reordering than release consistency and, among other optimizations, enables an efficient software cache coherency solution and posted writes. We also present a software cache coherency implementation and discuss a software circular buffer administration that does not need an atomic read-modify-write instruction. A small experiment demonstrates the potential performance increase of posted writes in MPSoCs with high communication latencies.

[1]  Alain Greiner,et al.  On Cache Coherency and Memory Consistency Issues in NoC Based Shared Memory Multiprocessor SoC Architectures , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[2]  Peter Cumming,et al.  The TI OMAP™ Platform Approach to SOC , 2003 .

[3]  A. Chirila-Rus,et al.  Low-power MPEG-4 video encoder design , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[4]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5]  Kees G. W. Goossens,et al.  C-HEAP: A Heterogeneous Multi-Processor Architecture Template and Scalable and Flexible Protocol for the Design of Embedded Signal Processing Systems , 2002, Des. Autom. Embed. Syst..

[6]  Luca Benini,et al.  Networks on chips - technology and tools , 2006, The Morgan Kaufmann series in systems on silicon.

[7]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[8]  Rama Chellappa,et al.  An architectural level design methodology for embedded face detection , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[9]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[10]  Om Prakash Gangwal,et al.  A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[11]  Frank Hofmann,et al.  Digital Radio Mondiale (DRM) digital sound broadcasting in the AM bands , 2003, IEEE Trans. Broadcast..

[12]  Mary K. Vernon,et al.  Comparison of hardware and software cache coherence schemes , 1991, ISCA '91.

[13]  Kourosh Gharachorloo,et al.  Memory consistency models for shared-memory multiprocessors , 1995 .

[14]  Evert-Jan D. Pol,et al.  Caching Techniques for Multi-Processor Streaming Architectures , 2004 .

[15]  Santanu Dutta,et al.  Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems , 2001, IEEE Des. Test Comput..

[16]  Pierre Kuonen,et al.  Parallel Computer Architectures for Commodity Computing , 1999 .