Harnessing the power of massively parallel devices like the graphics processing unit (GPU) is difficult for algorithms that show dynamic or inhomogeneous workloads. To achieve high performance, such advanced algorithms require scalable, concurrent queues to collect and distribute work. We present a new concurrent work queue, the Broker Queue, a highly efficient, linearizable queue for fine-granular work distribution on the GPU. We evaluate its usability and benefits in contrast to existing queuing algorithms. Our queue is up to one order of magnitude faster than non-blocking queues, and outperforms simpler queue designs that are unfit for fine-granular work distribution.
[1]
Larry Rudolph,et al.
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors
,
1983,
TOPL.
[2]
Dieter Schmalstieg,et al.
Whippletree
,
2014,
ACM Trans. Graph..
[3]
Nir Shavit,et al.
Flat combining and the synchronization-parallelism tradeoff
,
2010,
SPAA '10.
[4]
Song Jiang,et al.
Wormhole: A Fast Ordered Index for In-memory Data Management
,
2018
.
[5]
Yehuda Afek,et al.
Fast concurrent queues for x86 processors
,
2013,
PPoPP '13.