Distribution Policies for Datalog

Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds.

[1]  Dan Suciu,et al.  Optimizing Large-Scale Semi-Naïve Datalog Evaluation in Hadoop , 2012, Datalog.

[2]  Abraham Silberschatz,et al.  A framework for the parallel processing of Datalog queries , 1990, SIGMOD '90.

[3]  Ke Wang,et al.  Data Partition and Parallel Evaluation of Datalog Programs , 1995, IEEE Trans. Knowl. Data Eng..

[4]  Ouri Wolfson,et al.  Sharing The Load Of Logic-program Evaluation , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[5]  Abraham Silberschatz,et al.  Parallel Bottom-Up Processing of Datalog Queries , 1992, J. Log. Program..

[6]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Dan Suciu,et al.  Parallel evaluation of conjunctive queries , 2011, PODS.

[9]  Carlo Zaniolo,et al.  Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.

[10]  Dan Suciu,et al.  From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System , 2015, SIGMOD Conference.

[11]  Thomas Schwentick,et al.  Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation , 2016, ICDT.

[12]  Dan Suciu,et al.  A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries , 2017, PODS.

[13]  Stavros S. Cosmadakis,et al.  Parallel evaluation of recursive rule queries , 1985, PODS '86.

[14]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[15]  Jeffrey D. Ullman,et al.  Map-reduce extensions and recursive queries , 2011, EDBT/ICDT '11.

[16]  Jeffrey D. Ullman,et al.  Transitive closure and recursive Datalog implemented on clusters , 2012, EDBT '12.

[17]  Paris C. Kanellakis,et al.  Logic Programming and Parallel Complexity , 1986, Foundations of Deductive Databases and Logic Programming..

[18]  Magdalena Balazinska,et al.  Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines , 2015, Proc. VLDB Endow..

[19]  Georg Lausen,et al.  Parallelizing Datalog programs by generalized pivoting , 1991, PODS '91.

[20]  Abraham Silberschatz,et al.  Distributed processing of logic programs , 1988, SIGMOD '88.

[21]  Jeffrey D. Ullman,et al.  Parallel Complexity of Logical Query Programs , 1986, FOCS.

[22]  Ouri Wolfson,et al.  A new paradigm for parallel and distributed rule-processing , 1990, SIGMOD '90.

[23]  Thomas Schwentick,et al.  Parallel-Correctness and Transferability for Conjunctive Queries , 2015, PODS.

[24]  Salvatore J. Stolfo,et al.  Predictive dynamic load balancing of parallel and distributed rule and query processing , 1994, SIGMOD '94.

[25]  Jeffrey D. Ullman,et al.  Optimizing joins in a map-reduce environment , 2010, EDBT '10.

[26]  Dan Suciu,et al.  Worst-Case Optimal Algorithms for Parallel Query Processing , 2016, ICDT.

[27]  Christos H. Papadimitriou,et al.  The parallel complexity of simple chain queries , 1987, PODS '87.

[28]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[29]  Dan Suciu,et al.  Skew in parallel query processing , 2014, PODS.

[30]  Dan Suciu,et al.  Demonstration of the Myria big data management service , 2014, SIGMOD Conference.

[31]  Yavor Nenov,et al.  Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems , 2014, AAAI.