Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures

We introduce "asynchronized concurrency (ASCY)," a paradigm consisting of four complementary programming patterns. ASCY calls for the design of concurrent search data structures (CSDSs) to resemble that of their sequential counterparts. We argue that ASCY leads to implementations which are portably scalable: they scale across different types of hardware platforms, including single and multi-socket ones, for various classes of workloads, such as read-only and read-write, and according to different performance metrics, including throughput, latency, and energy. We substantiate our thesis through the most exhaustive evaluation of CSDSs to date, involving 6 platforms, 22 state-of-the-art CSDS algorithms, 10 re-engineered state-of-the-art CSDS algorithms following the ASCY patterns, and 2 new CSDS algorithms designed with ASCY in mind. We observe up to 30% improvements in throughput in the re-engineered algorithms, while our new algorithms out-perform the state-of-the-art alternatives.

[1]  GramoliVincent More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms , 2015 .

[2]  Tudor David,et al.  Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.

[3]  Maurice Herlihy,et al.  On the power of hardware transactional memory to simplify memory management , 2011, PODC '11.

[4]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[5]  Eran Yahav,et al.  Practical concurrent binary search trees via logical ordering , 2014, PPoPP '14.

[6]  Mark Moir,et al.  Lock-free reference counting , 2002, PODC '01.

[7]  Maged M. Michael,et al.  High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.

[8]  Paul E. McKenney,et al.  Scaling dcache with RCU , 2004 .

[9]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[10]  Dan Alistarh,et al.  StackTrack: an automated transactional approach to concurrent memory reclamation , 2014, EuroSys '14.

[11]  Maurice Herlihy,et al.  A Lazy Concurrent List-Based Set Algorithm , 2007, Parallel Process. Lett..

[12]  Kunle Olukotun,et al.  A practical concurrent binary search tree , 2010, PPoPP '10.

[13]  Maurice Herlihy,et al.  Obstruction-free synchronization: double-ended queues as an example , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[14]  Erez Petrank,et al.  Drop the anchor: lightweight memory management for non-blocking data structures , 2013, SPAA.

[15]  John D. Valois Lock-free linked lists using compare-and-swap , 1995, PODC '95.

[16]  Jonathan Walpole,et al.  Performance of memory reclamation for lockless synchronization , 2007, J. Parallel Distributed Comput..

[17]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[18]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[19]  Neeraj Mittal,et al.  Fast concurrent lock-free binary search trees , 2014, PPoPP.

[20]  Paul E. McKenney,et al.  READ-COPY UPDATE: USING EXECUTION HISTORY TO SOLVE CONCURRENCY PROBLEMS , 2002 .

[21]  Shane V. Howley,et al.  A non-blocking internal binary search tree , 2012, SPAA '12.

[22]  Marina Papatriantafilou,et al.  Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting , 2009, IEEE Transactions on Parallel and Distributed Systems.

[23]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[24]  Philippas Tsigas,et al.  Fast and lock-free concurrent priority queues for multi-thread systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[25]  Paul E. McKenney,et al.  Using Read-Copy-Update Techniques for System V IPC in the Linux 2.5 Kernel , 2003, USENIX Annual Technical Conference, FREENIX Track.

[26]  Hagit Attiya,et al.  Concurrent updates with RCU: search tree as an example , 2014, PODC '14.

[27]  M. Frans Kaashoek,et al.  CPHASH: a cache-partitioned hash table , 2012, PPoPP '12.

[28]  Tudor David,et al.  Designing ASCY-compliant Concurrent Search Data Structures , 2014 .

[29]  Luis Ceze,et al.  Characterizing the Performance and Energy Efficiency of Lock-Free Data Structures , 2011, 2011 15th Workshop on Interaction between Compilers and Computer Architectures.

[30]  Kevin M. Lepak,et al.  Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor , 2010, IEEE Micro.

[31]  Timothy L. Harris,et al.  A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.

[32]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[33]  Jonathan Walpole,et al.  User-Level Implementations of Read-Copy Update , 2012, IEEE Transactions on Parallel and Distributed Systems.

[34]  Austin T. Clements,et al.  The scalable commutativity rule: designing scalable software for multicore processors , 2013, SOSP.

[35]  Philippas Tsigas,et al.  Scalable and lock-free concurrent dictionaries , 2004, SAC '04.

[36]  Faith Ellen,et al.  Non-blocking binary search trees , 2010, PODC.

[37]  Keir Fraser,et al.  Practical lock-freedom , 2003 .

[38]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[39]  Maurice Herlihy,et al.  Dynamic-sized lock-free data structures , 2002, PODC '02.

[40]  Jonathan Walpole,et al.  Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming , 2011, USENIX ATC.

[41]  Maurice Herlihy,et al.  The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, Lock-Free Data Structures , 2002, DISC.

[42]  Maurice Herlihy,et al.  The Art of Multiprocessor Programming, Revised Reprint , 2012 .

[43]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[44]  William Pugh,et al.  Concurrent maintenance of skip lists , 1990 .

[45]  M. Frans Kaashoek,et al.  Scalable address spaces using RCU balanced trees , 2012, ASPLOS XVII.

[46]  Maurice Herlihy,et al.  A Simple Optimistic Skiplist Algorithm , 2007, SIROCCO.

[47]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[48]  Jonathan Walpole,et al.  Scalable concurrent hash tables via relativistic programming , 2010, OPSR.