Light-weight synchronization for inter-processor communication acceleration on embedded MPSoCs

The advances in semiconductor technologies have placed MPSoCscenter stage as a standard architecture for embedded applications of ever increasing complexity. Efficient utilization of the ample hardware resources requires applications to be decomposed into fine-grained threads, engendering in turn a large amount of interprocessor communications. While fine-grained on-chip interconnects can reduce the data transfer overhead, the traditional synchronization mechanisms, such as spin locks and barriers, still cause significant contention in polling shared variables. To overcome this issue, in this paper we propose a light-weight distributed synchronization mechanism which statically encodes the semantically correct order of accesses to each shared variable. A sharp reduction in the number of code bits is attained through a reference coloring algorithm, which furthermore enables an implementation within negligible hardware overhead. This light-weight synchronization mechanism allows dependent threads to frequently exchange data during execution, in turn enabling the exploration of fine-grained parallelism for applications with complex dependences.

[1]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[2]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[3]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[4]  James R. Goodman,et al.  Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.

[5]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[6]  Jeffrey S. Vetter,et al.  Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[7]  Hideki Ando,et al.  An on-chip multiprocessor architecture with a non-blocking synchronization mechanism , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Wayne H. Wolf The future of multiprocessor systems-on-chips , 2004, Proceedings. 41st Design Automation Conference, 2004..

[10]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[11]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[12]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[13]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[14]  Luca Benini,et al.  NoC synthesis flow for customized domain specific multiprocessor systems-on-chip , 2005, IEEE Transactions on Parallel and Distributed Systems.

[15]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.