A Scalable, Correct Time-Stamped Stack

Concurrent data-structures, such as stacks, queues, and deques, often implicitly enforce a total order over elements in their underlying memory layout. However, much of this order is unnecessary: linearizability only requires that elements are ordered if the insert methods ran in sequence. We propose a new approach which uses timestamping to avoid unnecessary ordering. Pairs of elements can be left unordered if their associated insert operations ran concurrently, and order imposed as necessary at the eventual removal. We realise our approach in a new non-blocking data-structure, the TS (timestamped) stack. Using the same approach, we can define corresponding queue and deque data-structures. In experiments on x86, the TS stack outperforms and outscales all its competitors -- for example, it outperforms the elimination-backoff stack by factor of two. In our approach, more concurrency translates into less ordering, giving less-contended removal and thus higher performance and scalability. Despite this, the TS stack is linearizable with respect to stack semantics. The weak internal ordering in the TS stack presents a challenge when establishing linearizability: standard techniques such as linearization points work well when there exists a total internal order. We present a new stack theorem, mechanised in Isabelle, which characterises the orderings sufficient to establish stack semantics. By applying our stack theorem, we show that the TS stack is indeed linearizable. Our theorem constitutes a new, generic proof technique for concurrent stacks, and it paves the way for future weakly ordered data-structure designs.

[1]  Thomas A. Henzinger,et al.  Replacing competition with cooperation to achieve scalable lock-free FIFO queues , 2013 .

[2]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[3]  Thomas A. Henzinger,et al.  Aspect-Oriented Linearizability Proofs , 2013, CONCUR.

[4]  Martin C. Rinard,et al.  Detecting and eliminating memory leaks using cyclic memory allocation , 2007, ISMM '07.

[5]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[6]  Nir Shavit,et al.  The Baskets Queue , 2007, OPODIS.

[7]  Christoph M. Kirsch,et al.  How FIFO is your concurrent FIFO queue? , 2012, RACES '12.

[8]  Yehuda Afek,et al.  Fast concurrent queues for x86 processors , 2013, PPoPP '13.

[9]  Nir Shavit,et al.  A scalable lock-free stack algorithm , 2004, SPAA '04.

[10]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[11]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[12]  Alexey Gotsman,et al.  Library abstraction for C/C++ concurrency , 2013, POPL.

[13]  Danny Hendler,et al.  A Dynamic Elimination-Combining Stack Algorithm , 2011, OPODIS.

[14]  Danny Hendler,et al.  Brief announcement: an asymmetric flat-combining based queue algorithm , 2013, PODC '13.

[15]  Ana Sokolova,et al.  Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation , 2013, CF '13.

[16]  Rachid Guerraoui,et al.  Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated , 2011, POPL '11.

[17]  Thomas A. Henzinger,et al.  Aspect-Oriented Linearizability Proofs , 2013, CONCUR.

[18]  Yujie Liu,et al.  Boosting timestamp-based transactional memory by exploiting hardware cycle counters , 2013, TACO.

[19]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[20]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.