System-on-a-chip processor synchronization support in hardware

For scalable-shared memory multiprocessor System-on-a-Chip implementations, synchronization overhead may cause catastrophic stalls in the system. Efficient improvements in the synchronization overhead in terms of latency, memory bandwidth, delay and scalability of the system involve a solution in hardware rather than in software. This paper presents a novel, efficient, small and very simple hardware unit that brings significant improvements in all of the above criteria: in an example, we reduce time spent for lock latency by a factor of 4.8, the worst-case execution of lock delay in a database application by a factor of more than 450. Furthermore, we developed a software architecture together with RTOS support to leverage our hardware mechanism. The worst-case simulation results of a client-server example on a four-processor system showed that our mechanism achieved an overall speedup of 27%.