Parallel Finger Search Structures

In this paper we present two versions of a parallel finger structure FS on p processors that supports searches, insertions and deletions, and has a finger at each end. This is to our knowledge the first implementation of a parallel search structure that is work-optimal with respect to the finger bound and yet has very good parallelism (within a factor of O( (log p)^2 ) of optimal). We utilize an extended implicit batching framework that transparently facilitates the use of FS by any parallel program P that is modelled by a dynamically generated DAG D where each node is either a unit-time instruction or a call to FS. The total work done by either version of FS is bounded by the finger bound F[L] (for some linearization L of D ), i.e. each operation on an item with finger distance r takes O( log r + 1 ) amortized work; it is cheaper for items closer to a finger. Running P using the simpler version takes O( ( T[1] + F[L] ) / p + T[inf] + d * ( (log p)^2 + log n ) ) time on a greedy scheduler, where T[1],T[inf] are the size and span of D respectively, and n is the maximum number of items in FS, and d is the maximum number of calls to FS along any path in D. Using the faster version, this is reduced to O( ( T[1] + F[L] ) / p + T[inf] + d * (log p)^2 + s[L] ) time, where s[L] is the weighted span of D where each call to FS is weighted by its cost according to F[L]. We also sketch how to extend FS to support a fixed number of movable fingers. The data structures in our paper fit into the dynamic multithreading paradigm, and their performance bounds are directly composable with other data structures given in the same paradigm. Also, the results can be translated to practical implementations using work-stealing schedulers.

[1]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[2]  Seth Gilbert,et al.  Parallel Working-Set Search Structures , 2018, SPAA.

[3]  Faith Ellen,et al.  Non-blocking binary search trees , 2010, PODC.

[4]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[5]  Peter Sanders,et al.  Fast Parallel Operations on Search Trees , 2015, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[6]  S. Rao Kosaraju,et al.  Localized search in sorted lists , 1981, STOC '81.

[7]  Guy E. Blelloch,et al.  Pipelining with Futures , 1997, SPAA '97.

[8]  Guy E. Blelloch,et al.  Optimal Parallel Algorithms in the Binary-Forking Model , 2019, SPAA.

[9]  Faith Ellen,et al.  The amortized complexity of non-blocking binary search trees , 2014, PODC '14.

[10]  Erik Hagersten,et al.  Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[11]  Michael T. Goodrich,et al.  Sorting on a parallel pointer machine with applications to set expression evaluation , 1996, JACM.

[12]  J. Ian Munro,et al.  Sorting and Searching in Multisets , 1976, SIAM J. Comput..

[13]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[14]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[15]  Guy E. Blelloch,et al.  Just Join for Parallel Ordered Sets , 2016, SPAA.

[16]  Haim Kaplan,et al.  CBTree: A Practical Concurrent Self-Adjusting Search Tree , 2012, DISC.

[17]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[18]  Robert E. Tarjan,et al.  Self-adjusting binary search trees , 1985, JACM.

[19]  Y. Oyama,et al.  EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY , 1999 .

[20]  Panagiota Fatourou,et al.  Revisiting the combining synchronization technique , 2012, PPoPP '12.

[21]  Peter Sanders,et al.  Parallel Bi-objective Shortest Paths Using Weight-Balanced B-trees with Bulk Updates , 2014, SEA.

[22]  Wei Quan Lim Optimal Multithreaded Batch-Parallel 2-3 Trees , 2019, ArXiv.

[23]  Leonidas J. Guibas,et al.  A new representation for linear lists , 1977, STOC '77.

[24]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[25]  Faith Ellen,et al.  A general technique for non-blocking trees , 2014, PPoPP '14.

[26]  Guy E. Blelloch,et al.  Batch-Parallel Euler Tour Trees , 2018, ALENEX.

[27]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[28]  Uzi Vishkin,et al.  Parallel Dictionaries in 2-3 Trees , 1983, ICALP.

[29]  Guy E. Blelloch,et al.  Fast set operations using treaps , 1998, SPAA '98.

[30]  John Iacono,et al.  Alternatives to splay trees with O(log n) worst-case access times , 2001, SODA '01.

[31]  Haim Kaplan,et al.  The CB tree: a practical concurrent self-adjusting search tree , 2014, Distributed Computing.

[32]  Maurice Herlihy,et al.  Contention in shared memory algorithms , 1997, J. ACM.

[33]  Petr Kuznetsov,et al.  Parallel Combining: Benefits of Explicit Synchronization , 2017, OPODIS.