Handling shared variable synchronization in multi-core Network-on-Chips with distributed memory

Parallelized shared variable applications running on multi-core Network-on-Chips (NoCs) require efficient support for synchronization, since communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. In this paper, we propose a dedicated hardware module for synchronization management. This module is called Synchronization Handler (SH), integrated with each processor-memory node on the multi-core NoCs. It uses two physical buffers to concurrently process synchronization requests issued by the local processor and remote processors via the on-chip network. One salient feature is that the two physical buffers are dynamically allocated to form multiple virtual buffers (a virtual buffer is related to a shared synchronization variable) so as to improve the buffer utilization and alleviate the head-of-line blocking. Synthesis results suggest that the SH can run over 900 MHz in 130nm technology with small area overhead. To justify the SH-enhanced multicore NoCs, we employ synthetic workloads to evaluate synchronization cost and buffer utilization, and run synchronization-intensive applications to investigate speedup. The results show that our approach is viable.