Efficient Concurrent Search Trees Using Portable Fine-Grained Locality

Concurrent search trees are crucial data abstractions widely used in many important systems such as databases, file systems and data storage. Like other fundamental abstractions for energy-efficient computing, concurrent search trees should support both high concurrency and fine-grained data locality in a platform-independent manner. However, existing portable fine-grained locality-aware search trees such as ones based on the van Emde Boas layout (vEB-based trees) poorly support concurrent update operations while existing highly-concurrent search trees such as non-blocking search trees do not consider fine-grained data locality. In this paper, we first present a novel methodology to achieve both portable fine-grained data locality and high concurrency for search trees. Based on the methodology, we devise a novel locality-aware concurrent search tree called GreenBST. To the best of our knowledge, GreenBST is the first practical search tree that achieves both portable fine-grained data locality and high concurrency. We analyze and compare GreenBST energy efficiency (in operations/Joule) and performance (in operations/second) with seven prominent concurrent search trees on a high performance computing (HPC) platform (Intel Xeon), an embedded platform (ARM), and an accelerator platform (Intel Xeon Phi) using parallel micro- benchmarks (Synchrobench). Our experimental results show that GreenBST achieves the best energy efficiency and performance on all the different platforms. GreenBST achieves up to 50 percent more energy efficiency and 60 percent higher throughput than the best competitor in the parallel benchmarks. These results confirm the viability of our new methodology to achieve both portable fine-grained data locality and high concurrency for search trees.

[1]  Trevor Brown A Template for Implementing Fast Lock-free Trees Using HTM , 2017, PODC.

[2]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[3]  Chris Hawblitzel,et al.  Cosh: Clear OS Data Sharing In An Incoherent World , 2014, TRIOS.

[4]  Trevor Brown,et al.  Non-blocking k-ary Search Trees , 2011, OPODIS.

[5]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6]  Gerth Stølting Brodal,et al.  Cache oblivious search trees via binary trees of small height , 2001, SODA '02.

[7]  Michael A. Bender,et al.  Concurrent cache-oblivious b-trees , 2005, SPAA '05.

[8]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[9]  Michael A. Bender,et al.  Cache-Oblivious B-Trees , 2005, SIAM J. Comput..

[10]  Arne Andersson Faster deterministic sorting and searching in linear space , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[11]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[12]  Haim Kaplan,et al.  CBTree: A Practical Concurrent Self-Adjusting Search Tree , 2012, DISC.

[13]  Hagit Attiya,et al.  Concurrent updates with RCU: search tree as an example , 2014, PODC '14.

[14]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[15]  Erik D. Demaine,et al.  Cache-Oblivious Algorithms and Data Structures , 2003 .

[16]  Peter Desnoyers,et al.  Reducing Data Movement Costs Using Energy-Efficient, Active Computation on SSD , 2012, HotPower.

[17]  Faith Ellen,et al.  Non-blocking binary search trees , 2010, PODC.

[18]  Phuong Hoai Ha,et al.  DeltaTree: A Locality-aware Concurrent Search Tree , 2015, SIGMETRICS.

[19]  Longbo Huang,et al.  Power Cost Reduction in Distributed Data Centers: A Two-Time-Scale Approach for Delay Tolerant Workloads , 2015, IEEE Transactions on Parallel and Distributed Systems.

[20]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[21]  Pradeep Dubey,et al.  PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..

[22]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[23]  Bill Dally Power, Programmability, and Granularity: The Challenges of ExaScale Computing , 2011, IPDPS.

[24]  Neeraj Mittal,et al.  Fast concurrent lock-free binary search trees , 2014, PPoPP.

[25]  Peter van Emde Boas,et al.  Preserving order in a forest in less than logarithmic time , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[26]  Michel Raynal,et al.  A speculation‐friendly binary search tree , 2019, Concurr. Comput. Pract. Exp..

[27]  Kunle Olukotun,et al.  A practical concurrent binary search tree , 2010, PPoPP '10.

[28]  Phuong Hoai Ha,et al.  GreenBST: Energy-Efficient Concurrent Search Tree , 2016, Euro-Par.

[29]  Ziqi Wang,et al.  Building a Bw-Tree Takes More Than Just Buzz Words , 2018, SIGMOD Conference.

[30]  Tudor David,et al.  Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures , 2015, ASPLOS.

[31]  Mor Harchol-Balter,et al.  Optimality analysis of energy-performance trade-off for server farm management , 2010, Perform. Evaluation.

[32]  Kihong Kim,et al.  Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems , 2001, VLDB.

[33]  Vincent Gramoli,et al.  More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms , 2015, PPoPP.

[34]  Eran Yahav,et al.  Practical concurrent binary search trees via logical ordering , 2014, PPoPP '14.

[35]  S. B. Yao,et al.  Efficient locking for concurrent operations on B-trees , 1981, TODS.

[36]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[37]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[38]  OHAD RODEH,et al.  B-trees, shadowing, and clones , 2008, TOS.

[39]  Lachlan L. H. Andrew,et al.  Power-aware speed scaling in processor sharing systems: Optimality and robustness , 2012, Perform. Evaluation.

[40]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[41]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[42]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[43]  Michael A. Bender,et al.  Cache-oblivious streaming B-trees , 2007, SPAA '07.

[44]  Haibo Chen,et al.  Using restricted transactional memory to build a scalable in-memory database , 2014, EuroSys '14.

[45]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[46]  Gerth Stølting Brodal,et al.  Cache-Oblivious Algorithms and Data Structures , 2004, SWAT.

[47]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .