Constructing Shared Objects That Are Both Robust and High-Throughput

Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, non-linearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, nonblocking, and can provably achieve high throughput in semisynchronous executions. The algorithm is based on a novel variation of the software combining paradigm that we call bounded-wait combining. It can thus be used to obtain implementations, possessing the same properties, of any object that supports combinable operations, such as stack or queue. Unlike previous combining algorithms where processes may have to wait for each other indefinitely, in the bounded-wait combining algorithm a process only waits for other processes for a bounded period of time and then ‘takes destiny in its own hands'. In order to reason rigorously about the parallelism attainable by our algorithm, we define a novel metric for measuring the throughput of shared objects which we believe is interesting in its own right. We use this metric to prove that our algorithm can achieve throughput of Ω(N / logN) in executions where process speeds vary only by a constant factor, where N is the number of processes that can participate in the algorithm. We also introduce and use pseduo-transactions – a technique for concurrent execution that may prove useful for other algorithms.

[1]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[2]  Ralph Grishman,et al.  The NYU ultracomputer—designing a MIMD, shared-memory parallel machine , 2018, ISCA '98.

[3]  Marcin Paprzycki,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 2001, Scalable Comput. Pract. Exp..

[4]  Mary K. Vernon,et al.  Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS III.

[5]  Nir Shavit,et al.  Linear lower bounds on real-world implementations of concurrent objects , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[6]  Michael B. Greenwald,et al.  Two-handed emulation: how to build non-blocking implementations of complex data-structures using DCAS , 2002, PODC '02.

[7]  Larry Rudolph,et al.  Efficient synchronization of multiprocessors with shared memory , 1988, TOPL.

[8]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[9]  Nir Shavit,et al.  Diffracting trees , 1996, TOCS.

[10]  Nir Shavit,et al.  A scalable lock-free stack algorithm , 2004, SPAA '04.

[11]  Marina Papatriantafilou,et al.  Self-tuning Reactive Distributed Trees for Counting and Balancing , 2004, OPODIS.

[12]  Maurice Herlihy,et al.  Obstruction-free synchronization: double-ended queues as an example , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[13]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[14]  Hagit Attiya,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 1998 .

[15]  Maurice Herlihy,et al.  Counting networks , 1994, JACM.

[16]  Larry Rudolph,et al.  Efficient synchronization of multiprocessors with shared memory , 1986, PODC '86.

[17]  N. Lynch,et al.  Timing-based mutual exclusion , 1992, [1992] Proceedings Real-Time Systems Symposium.

[18]  Rajeev Alur,et al.  Time-adaptive algorithms for synchronization , 1994, STOC '94.

[19]  Keir Fraser,et al.  A Practical Multi-word Compare-and-Swap Operation , 2002, DISC.

[20]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.