Parallel Combining: Benefits of Explicit Synchronization

Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel batched one. The idea is that we explicitly synchronize concurrent operations into batches: one of the processes becomes a combiner which collects concurrent requests and initiates a parallel batched algorithm involving the owners (clients) of the collected requests. Intuitively, the cost of synchronizing the concurrent calls can be compensated by running the parallel batched algorithm. We validate the intuition via two applications of parallel combining. First, we use our technique to design a concurrent data structure optimized for read-dominated workloads, taking a dynamic graph data structure as an example. Second, we use a novel parallel batched priority queue to build a concurrent one. In both cases, we obtain performance gains with respect to the state-of-the-art algorithms.

[1]  Bengt Jonsson,et al.  A Skiplist-Based Concurrent Priority Queue with Minimal Memory Contention , 2013, OPODIS.

[2]  Timothy L. Harris,et al.  A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.

[3]  Nir Shavit,et al.  Scalable Flat-Combining Based Synchronous Queues , 2010, DISC.

[4]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[5]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[6]  Geppino Pucci,et al.  Parallel Priority Queues , 1991, Inf. Process. Lett..

[7]  Michel Raynal,et al.  A Contention-Friendly Binary Search Tree , 2013, Euro-Par.

[8]  Traviss. Craig,et al.  Building FIFO and Priority-Queuing Spin Locks from Atomic Swap , 1993 .

[9]  Gaston H. Gonnet,et al.  Heaps on Heaps , 1982, SIAM J. Comput..

[10]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[11]  Peter Sanders,et al.  A Bulk-Parallel Priority Queue in External Memory with STXXL , 2015, SEA.

[12]  Xavier Messeguer Peypoch,et al.  Height-relaxed AVL rebalancing: a unified, fine-grained approach to concurrent dictionaries , 1998 .

[13]  Jesper Larsson Träff,et al.  A Parallel Priority Queue with Constant Time Operations , 1998, J. Parallel Distributed Comput..

[14]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[15]  Narsingh Deo,et al.  Parallel heap: An optimal parallel priority queue , 2004, The Journal of Supercomputing.

[16]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[17]  Maurice Herlihy,et al.  A Lazy Concurrent List-Based Set Algorithm , 2007, Parallel Process. Lett..

[18]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[19]  Umut A. Acar,et al.  Brief Announcement: Parallel Dynamic Tree Contraction via Self-Adjusting Computation , 2017, SPAA.

[20]  Panagiota Fatourou,et al.  A highly-efficient wait-free universal construction , 2011, SPAA '11.

[21]  Rajiv Gupta,et al.  A scalable implementation of barrier synchronization using an adaptive combining tree , 1990, International Journal of Parallel Programming.

[22]  Y. Oyama,et al.  EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY , 1999 .

[23]  Mikkel Thorup,et al.  Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity , 2001, JACM.

[24]  Peter Sanders,et al.  Randomized Priority Queues for Fast Parallel Access , 1998, J. Parallel Distributed Comput..

[25]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[26]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[27]  Erez Petrank,et al.  LCD: Local Combining on Demand , 2014, OPODIS.

[28]  Jacob Nelson,et al.  Flat Combining Synchronized Global Data Structures , 2013 .

[29]  Guy E. Blelloch,et al.  Just Join for Parallel Ordered Sets , 2016, SPAA.

[30]  Panagiota Fatourou,et al.  Revisiting the combining synchronization technique , 2012, PPoPP '12.

[31]  Eran Yahav,et al.  Practical concurrent binary search trees via logical ordering , 2014, PPoPP '14.

[32]  Michel Raynal,et al.  Specifying Concurrent Problems: Beyond Linearizability and up to Tasks - (Extended Abstract) , 2015, DISC.

[33]  Nir Shavit,et al.  Combining Funnels: A Dynamic Approach to Software Combining , 2000, J. Parallel Distributed Comput..

[34]  Erez Petrank,et al.  CBPQ: High Performance Lock-Free Priority Queue , 2016, Euro-Par.

[35]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[36]  Nir Shavit,et al.  Skiplist-based concurrent priority queues , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[37]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[38]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.