The scalability of parallel programs is often bounded by the performance of synchronization mechanisms used to protect critical sections. The performance of these mechanisms is in turn determined by their ability to use modern hardware efficiently and do useful work while or instead of waiting. This brief announcement sketches the idea and implementation of queue delegation locking, a synchronization mechanism that provides high throughput by allowing threads to efficiently delegate their critical sections to the thread currently holding the lock and by allowing threads that do not need a result from their critical section to continue executing immediately after delegating their work. Experiments show that queue delegation locking outperforms leading synchronization mechanisms due to the combination of its fast operation transfer with its ability to allow threads to continue doing useful work instead of waiting. Thanks to its simple building blocks, even its uncontended overhead is low, making queue delegation locking useful in a wide variety of applications.
[1]
Julia L. Lawall,et al.
Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications
,
2012,
USENIX Annual Technical Conference.
[2]
Panagiota Fatourou,et al.
Revisiting the combining synchronization technique
,
2012,
PPoPP '12.
[3]
Yehuda Afek,et al.
Fast concurrent queues for x86 processors
,
2013,
PPoPP '13.
[4]
Y. Oyama,et al.
EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY
,
1999
.
[5]
Nir Shavit,et al.
Flat combining and the synchronization-parallelism tradeoff
,
2010,
SPAA '10.