Two key synchronization paradigms for the construction of scalable concurrent data-structures are software combining and elimination. Elimination-based concurrent data-structures allow operations with reverse semantics (such as push and pop stack operations) to "collide" and exchange values without having to access a central location. Software combining, on the other hand, is effective when colliding operations have identical semantics: when a pair of threads performing operations with identical semantics collide, the task of performing the combined set of operations is delegated to one of the threads and the other thread waits for its operation(s) to be performed. Applying this mechanism iteratively can reduce memory contention and increase throughput.
The most highly scalable prior concurrent stack algorithm is the elimination-backoff stack [5]. The elimination-backoff stack provides high parallelism for symmetric workloads in which the numbers of push and pop operations are roughly equal, but its performance deteriorates when workloads are asymmetric.
We present DECS, a novel Dynamic Elimination-Combining Stack algorithm, that scales well for all workload types. While maintaining the simplicity and low-overhead of the elimination-bakcoff stack, DECS manages to benefit from collisions of both identical- and reverse-semantics operations. Our empirical evaluation shows that DECS scales significantly better than both blocking and non-blocking best prior stack algorithms.
[1]
Nir Shavit,et al.
Flat combining and the synchronization-parallelism tradeoff
,
2010,
SPAA '10.
[2]
Maurice Herlihy,et al.
Linearizability: a correctness condition for concurrent objects
,
1990,
TOPL.
[3]
Maurice Herlihy,et al.
Wait-free synchronization
,
1991,
TOPL.
[4]
Satoshi Matsuoka,et al.
An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers
,
1992,
PPOPP '93.
[5]
Danny Hendler,et al.
An Adaptive Technique for Constructing Robust and High-Throughput Shared Objects
,
2010,
OPODIS.
[6]
Satoshi Matsuoka,et al.
An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers
,
1992,
Parallel Symbolic Computing.
[7]
Nir Shavit,et al.
A scalable lock-free stack algorithm
,
2004,
SPAA '04.
[8]
Maged M. Michael,et al.
Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
,
1998,
J. Parallel Distributed Comput..
[9]
Nir Shavit,et al.
Elimination Trees and the Construction of Pools and Stacks
,
1997,
Theory of Computing Systems.
[10]
Danny Hendler,et al.
Bounded-wait combining: constructing robust and high-throughput shared objects
,
2009,
Distributed Computing.
[11]
Nir Shavit,et al.
Scalable Producer-Consumer Pools Based on Elimination-Diffraction Trees
,
2010,
Euro-Par.
[12]
Nian-Feng Tzeng,et al.
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
,
1987,
IEEE Transactions on Computers.