A time complexity lower bound for randomized implementations of some shared objects

Many recent wait-free implementations are based on a sharedmemory that supports a pair of synchronization operations, known as LL and SC. In this paper, we establish an intrinsic performance limitation of these operations: even the simple wakeup problem [16], which requires some process to detect that all n processes are up, cannot be solved unless some process performs Ω(logn) shared-memory operations. Using this basic result, we derive a Ω(logn) lower bound on the worst-case shared-access time complexity of n-process implementations of several types of objects, including fetch&increment, fetch&multiply, fetch&and, queue, and stack. (The worst-case shared-access time complexity of an implementation is the number of shared-memory operations that a process performs, in the worst-case, in order to complete a single operation on the implementation.) Our lower bound is strong in several ways: it holds even if (1) shared-memory has an infinite number of words, each of unbounded size, (2) shared-memory supports move and swap operations in addition to LL, SC, and validate, (3) implementation employs randomization (in this case, the worst-case expected shared-access time complexity is Ω(log n)), and (4) each process applies only one operation on the implemented object. Finally, the lower bound is tight: if the size of shared registers is not restricted, the universal construction of Afek, Dauber, and Touitou [1] (after two minor modifications) has O(log n) worst-case shared-access time complexity. An n-process universal construction can be instantiated with the sequential specification of any type T to obtain a wait-free implementation of an atomic object of type T that n concurrent processes can share. A universal construction is oblivious if it does not exploit the semantics of the type that it is instantiated with. Our lower bound implies that for any shared object O implemented using any oblivious universal construction, no matter what O’s type is, in the worst-case a process performs Ω(log n) operations on shared∗This work is partially supported by NSF RIA grant CCR9410421. memory in order to complete a single operation on O. Thus, if our goal is to implement shared objects whose operations run in sublogarithmic time (preferably constant time), oblivious universal constructions cannot be useful; the design of sublogarithmic time implementations must necessarily exploit the semantics of the type being implemented.

[1]  Michael J. Fischer,et al.  The wakeup problem , 1990, STOC '90.

[2]  James Aspnes,et al.  Lower bounds for distributed coin-flipping and randomized consensus , 1997, STOC '97.

[3]  Mark Moir,et al.  Universal Constructions for Large Objects , 1995, IEEE Trans. Parallel Distributed Syst..

[4]  Amos Israeli,et al.  Disjoint-access-parallel implementations of strong shared memory primitives , 1994, PODC '94.

[5]  Traviss. Craig,et al.  Building FIFO and Priority-Queuing Spin Locks from Atomic Swap , 1993 .

[6]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[7]  Tushar Deepak Chandra,et al.  A polylog time wait-free construction for closed objects , 1998, PODC '98.

[8]  Dennis Shasha,et al.  Locking without blocking: making lock based concurrent data structure algorithms nonblocking , 1992, PODS '92.

[9]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[10]  B. Bershad Practical considerations for lock-free concurrent objects , 1991 .

[11]  Mark Moir,et al.  Universal constructions for multi-object operations , 1995, PODC '95.

[12]  Prasad Jayanti,et al.  A lower bound on the local time complexity of universal constructions , 1998, PODC '98.

[13]  Mark Moir Practical implementations of non-blocking synchronization primitives , 1997, PODC '97.

[14]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data structures , 1990, PPOPP '90.

[15]  Mark Moir,et al.  Efficient object sharing in shared-memory multiprocessors , 1996 .

[16]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[17]  Hagit Attiya,et al.  Universal operations: unary versus binary , 1996, PODC '96.

[18]  Gadi Taubenfeld,et al.  Disentangling Multi-object Operations , 1997 .

[19]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[20]  Yehuda Afek,et al.  Wait-free made fast , 1995, STOC '95.

[21]  Maurice Herlihy,et al.  Impossibility and universality results for wait-free synchronization , 1988, PODC '88.

[22]  Robert Sims,et al.  Alpha architecture reference manual , 1992 .

[23]  Travis S. Craig Queuing spin lock algorithms to support timing predictability , 1993, 1993 Proceedings Real-Time Systems Symposium.

[24]  Greg Barnes,et al.  A method for implementing lock-free shared-data structures , 1993, SPAA '93.

[25]  Maurice Herlihy,et al.  On the space complexity of randomized synchronization , 1993, PODC '93.

[26]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[27]  Mark Moir,et al.  Transparent Support for Wait-Free Transactions , 1997, WDAG.

[28]  Cathy May,et al.  The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .

[29]  Richard L. Sites,et al.  Alpha Architecture Reference Manual , 1995 .

[30]  Sam Toueg,et al.  Time and space lower bounds for non-blocking implementations (preliminary version) , 1996, PODC '96.

[31]  Robert Cypher The communication requirements of mutual exclusion , 1995, SPAA '95.

[32]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[33]  Yehuda Afek,et al.  Disentangling multi-object operations (extended abstract) , 1997, PODC '97.

[34]  Nancy A. Lynch,et al.  Are wait-free algorithms fast? , 1994, JACM.

[35]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.