A Universal Construction to implement Concurrent Data Structure for NUMA-muticore

Universal constructions are attractive as they can turn a sequential implementation of any data structure into a concurrent implementation. However, existing universal constructions have limitations, such as imposing high copying overhead, or poor scalability on NUMA systems mainly due to their lack of NUMA-aware design principles. To overcome these limitations, this paper introduces CR, a universal construction that provides highly scalable updates on NUMA systems while offering fast read-side performance. CR achieves NUMA-awareness by utilizing delegation within a NUMA node and a global shared log to maintain the consistency of replicas of data structures across nodes. Using CR does not require expertise in concurrent data structure design. Our evaluation shows that CR has up to 11.2 times better performance compared to a state-of-the-art universal construction CX on our tested sequential data structures. To demonstrate the effectiveness and applicability of CR, we have applied CR to an in-memory database system. The database shows up to 18.1 times better performance compared to the original version.

[1]  Julia L. Lawall,et al.  Fast and Portable Locking for Multicore Architectures , 2016, ACM Trans. Comput. Syst..

[2]  Tudor David,et al.  Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.

[3]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[4]  Maurice Herlihy,et al.  A Methodology for Implementing Highly Concurrent Data Objects , 1992, OPSR.

[5]  Rachid Guerraoui,et al.  Optimistic concurrency with OPTIK , 2016, PPOPP.

[6]  Pedro Ramalhete,et al.  Strong trylocks for reader-writer locks , 2018, PPOPP.

[7]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[8]  Philippas Tsigas,et al.  Fast and lock-free concurrent priority queues for multi-thread systems , 2005, J. Parallel Distributed Comput..

[9]  Burak Ok Friedrich-Alexander-Universität Lock-free Data Structures , 2017 .

[10]  Haibo Chen,et al.  Scalable Adaptive NUMA-Aware Lock , 2017, IEEE Transactions on Parallel and Distributed Systems.

[11]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[12]  Maged M. Michael,et al.  High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.

[13]  Silas Boyd-Wickizer,et al.  OpLog: a library for scaling update-heavy data structures , 2014 .

[14]  Marcos K. Aguilera,et al.  Black-box Concurrent Data Structures for NUMA Architectures , 2017, ASPLOS.

[15]  Mark Moir,et al.  Adaptive integration of hardware and software lock elision techniques , 2014, SPAA.

[16]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[17]  Pedro Ramalhete,et al.  A wait-free universal construction for large objects , 2019, PPoPP.

[18]  Håkan Grahn,et al.  Transactional memory , 2010, J. Parallel Distributed Comput..

[19]  Timothy L. Harris,et al.  Non-blocking Hashtables with Open Addressing , 2005, DISC.

[20]  Dahlia Malkhi,et al.  CORFU: A distributed shared log , 2013, TOCS.

[21]  Jakob Eriksson,et al.  ffwd: delegation is (much) faster than you think , 2017, SOSP.

[22]  Nir Shavit,et al.  Read-log-update: a lightweight synchronization mechanism for concurrent programming , 2015, SOSP.

[23]  Nir Shavit,et al.  NUMA-aware reader-writer locks , 2013, PPoPP '13.

[24]  Wolfgang E. Nagel,et al.  Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture , 2015, 2015 44th International Conference on Parallel Processing.

[25]  Panagiota Fatourou,et al.  Highly-Efficient Wait-Free Synchronization , 2013, Theory of Computing Systems.

[26]  Changwoo Min,et al.  MV-RLU: Scaling Read-Log-Update with Multi-Versioning , 2019, ASPLOS.

[27]  Paul E. McKenney,et al.  READ-COPY UPDATE: USING EXECUTION HISTORY TO SOLVE CONCURRENCY PROBLEMS , 2002 .

[28]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[29]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[30]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[31]  Paul E. McKenney,et al.  An HTM-based update-side synchronization for RCU on NUMA systems , 2020, EuroSys.

[32]  Victor Luchangco,et al.  Investigating the Performance of Hardware Transactions on a Multi-Socket Machine , 2016, SPAA.