Distributed and low-power synchronization architecture for embedded multiprocessors

In this paper we present a framework for a distributed and very low-cost implementation of synchronization controllers and protocols for embedded multiprocessors. The proposed architecture effectively implements the queued-lock semantics in a completely distributed way. The proposed approach to synchronization implementation not only completely eliminates the overwhelming bus contention traffic when multiple cores compete for a synchronization variable, but also achieves very high energy efficiency as the local synchronization controller can efficiently determine, without any bus transactions or local cache spinning, the exact timing of when the lock is made available to the local processor. Application-specific information regarding synchronization variables in the local task is exploited in implementing the distributed synchronization protocol. The local synchronization controllers enable the system software or the thread library to implement various low-power policies, such as disabling the cache accesses or even completely powering down the local processor while waiting for a synchronization variable.

[1]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[2]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[3]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  Yen-Kuang Chen,et al.  The ALPBench benchmark suite for complex multimedia applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[5]  Jaehwan Lee,et al.  A system-on-a-chip lock cache with task preemption support , 2001, CASES '01.

[6]  Vincent John Mooney,et al.  PARLAK: parametrized lock cache generator , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[7]  Gianluca Palermo,et al.  Efficient Synchronization for Embedded On-Chip Multiprocessors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Alex Orailoglu,et al.  Light-weight synchronization for inter-processor communication acceleration on embedded MPSoCs , 2007, CASES '07.

[9]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[10]  Massimo Poncino,et al.  On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors , 2007, GLSVLSI '07.

[11]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[12]  Michael C. Huang,et al.  The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[13]  Vincent John Mooney,et al.  System-on-a-chip processor synchronization support in hardware , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[14]  Luca Benini,et al.  Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms , 2007, CASES '07.