Brie: A Specialized Trie for Concurrent Datalog

Modern Datalog engines are employed in industrial applications such as graph databases, networks, and static program analysis. To cope with the vast amount of data in these applications, Datalog engines must employ specialized parallel data structures. In this paper, we introduce the Brie, a specialized data structure for high-density relations storing large data volumes. It effectively compresses dense data in a lock-free fashion and obtains up to 15× higher performance in parallel insertion benchmarks compared to state-of-the-art alternatives. Furthermore, when integrated into a Datalog engine running an industrial points-to analysis, runtime improves by a factor of 4× with a compression ratio of up to 3.6× are obtained.

[1]  Trevor Brown,et al.  Non-blocking k-ary Search Trees , 2011, OPODIS.

[2]  Carlos Alberto Martinez-Angeles,et al.  A Datalog Engine for GPUs , 2013, KDPD.

[3]  Erez Petrank,et al.  A lock-free B+tree , 2012, SPAA '12.

[4]  Michel Raynal,et al.  A speculation‐friendly binary search tree , 2012, PPoPP '12.

[5]  Guy E. Blelloch,et al.  Phase-concurrent hash tables for determinism , 2014, SPAA.

[6]  Bernhard Scholz,et al.  A specialized B-tree for concurrent datalog evaluation , 2019, PPoPP.

[7]  Nikolaj Bjørner,et al.  μZ- An Efficient Engine for Fixed Points with Constraints , 2011, CAV.

[8]  Yannis Smaragdakis,et al.  Porting doop to Soufflé: a tale of inter-engine portability for Datalog-based analyses , 2017, SOAP@PLDI.

[9]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[10]  Wolfgang Lehner,et al.  Improving in-memory database index performance with Intel® Transactional Synchronization Extensions , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[11]  Kunle Olukotun,et al.  A practical concurrent binary search tree , 2010, PPoPP '10.

[12]  Carlo Zaniolo,et al.  Graph Queries in a Next-Generation Datalog System , 2013, Proc. VLDB Endow..

[13]  S. B. Yao,et al.  Efficient locking for concurrent operations on B-trees , 1981, TODS.

[14]  Neeraj Mittal,et al.  Fast concurrent lock-free binary search trees , 2014, PPoPP.

[15]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[16]  Eran Yahav,et al.  Practical concurrent binary search trees via logical ordering , 2014, PPoPP '14.

[17]  Ouri Wolfson,et al.  Why a single parallelization strategy is not enough in knowledge bases , 1989, PODS '89.

[18]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[19]  Yannis Smaragdakis,et al.  Exception analysis and points-to analysis: better together , 2009, ISSTA.

[20]  Lijun Chang,et al.  Automatic Index Selection for Large-Scale Datalog Computation , 2018, Proc. VLDB Endow..

[21]  Martin Odersky,et al.  Concurrent tries with efficient non-blocking snapshots , 2012, PPoPP '12.

[22]  Idit Keidar,et al.  KiWi: A Key-Value Map for Scalable Real-Time Analytics , 2017, PPoPP.

[23]  Bernhard Scholz,et al.  Soufflé: On Synthesis of Program Analyzers , 2016, CAV.

[24]  Carlo Zaniolo,et al.  Scaling up the performance of more powerful Datalog systems on multicore machines , 2016, The VLDB Journal.

[25]  Shane V. Howley,et al.  A non-blocking internal binary search tree , 2012, SPAA '12.

[26]  Ramesh Govindan,et al.  A General Approach to Network Configuration Analysis , 2015, NSDI.

[27]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[28]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[29]  Dan Suciu,et al.  Optimizing Large-Scale Semi-Naïve Datalog Evaluation in Hadoop , 2012, Datalog.

[30]  Till Westmann,et al.  On fast large-scale program analysis in Datalog , 2016, CC.

[31]  Monica S. Lam,et al.  Using Datalog with Binary Decision Diagrams for Program Analysis , 2005, APLAS.

[32]  Hagit Attiya,et al.  Concurrent updates with RCU: search tree as an example , 2014, PODC '14.

[33]  Georg Lausen,et al.  Parallelizing Datalog programs by generalized pivoting , 1991, PODS '91.

[34]  Abraham Silberschatz,et al.  Distributed processing of logic programs , 1988, SIGMOD '88.

[35]  Nir Shavit,et al.  The SkipTrie: low-depth concurrent search without rebalancing , 2013, PODC '13.

[36]  Goetz Graefe,et al.  A survey of B-tree locking techniques , 2010, TODS.