A Scheduling Approach to Incremental Maintenance of Datalog Programs

In this paper, we study the problem of incremental maintenance of Datalog programs and model it as a scheduling problem on DAGs. We design provably good time- and memory-efficient scheduling algorithms for (re)executing a Datalog program where some (but not necessarily all) of the inputs have changed. We prove that our schedulers, called LevelBased and LevelBased with lookahead, have asymptotically improved running time and space efficiency when compared with benchmark algorithms used in production at LogicBlox.The main result of the paper is a hybrid scheduler, which combines LevelBased with the production LogicBlox scheduler (or any other heuristic scheduler). The hybrid scheduler achieves strong worst-case guarantees and robustness without losing out on the best-case behavior of the production LogicBlox scheduler. Our experiments show that the hybrid scheduler results in similar or improved total execution times compared to LogicBlox scheduler, while consistently reducing the scheduling overhead—by as much as 50% on some datasets. This hybrid scheme requires little to no overhead but provides predictability and reliability, which are crucial in a commercial application such as LogicBlox.

[1]  Lars Bækgaard,et al.  Incremental computation of nested relational query expressions , 1995, TODS.

[2]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[3]  William Pugh,et al.  Incremental computation via function caching , 1989, POPL '89.

[4]  William J. Cook,et al.  A Computational Study of the Job-Shop Scheduling Problem , 1991, INFORMS Journal on Computing.

[5]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[6]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[7]  Camil Demetrescu,et al.  Reactive Imperative Programming with Dataflow Constraints , 2014, ACM Trans. Program. Lang. Syst..

[8]  Esko Nuutila,et al.  Efficient transitive closure computation in large digraphs , 1995 .

[9]  Michael Hicks,et al.  Adapton: composable, demand-driven incremental computation , 2014, PLDI.

[10]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[11]  X. Liy Dynamic Algorithms in Computational Geometry , 2007 .

[12]  Magnus Carlsson Monads for incremental computing , 2002, ICFP '02.

[13]  Ihsan Sabuncuoglu,et al.  Backtracking and exchange of information: Methods to enhance a beam search algorithm for assembly line scheduling , 2008, Eur. J. Oper. Res..

[14]  Shan Shan Huang,et al.  Datalog and Recursive Query Processing , 2013, Found. Trends Databases.

[15]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[16]  C. R. Ramakrishnan,et al.  Incremental Evaluation of Tabled Prolog: Beyond Pure Logic Programs , 2006, PADL.

[17]  Dror G. Feitelson,et al.  Parallel Job Scheduling under Dynamic Workloads , 2003, JSSPP.

[18]  Peter Brucker,et al.  Job-shop Scheduling Problem , 2009, Encyclopedia of Optimization.

[19]  Yanhong A. Liu,et al.  Incremental Computation: A Semantics-Based Systematic Transformational Approach , 1995 .

[20]  Chao Tian,et al.  Incremental Graph Computations: Doable and Undoable , 2017, SIGMOD Conference.

[21]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[22]  M. T. Kaufman,et al.  An Almost-Optimal Algorithm for the Assembly Line Scheduling Problem , 1974, IEEE Transactions on Computers.

[23]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[24]  Gio Wiederhold,et al.  Incremental Recomputation of Active Relational Expressions , 1991, IEEE Trans. Knowl. Data Eng..

[25]  Akshat Verma,et al.  Shredder: GPU-accelerated incremental storage and computation , 2012, FAST.

[26]  C.-H. Luke Ong,et al.  Fixing Incremental Computation: Derivatives of Fixpoints, and the Recursive Semantics of Datalog , 2019, ESOP.

[27]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[28]  Valerie King,et al.  Fully dynamic algorithms for maintaining all-pairs shortest paths and transitive closure in digraphs , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[29]  Jacques Carlier,et al.  Handbook of Scheduling - Algorithms, Models, and Performance Analysis , 2004 .

[30]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[31]  Todd J. Green,et al.  LogicBlox, Platform and Language: A Tutorial , 2012, Datalog.