A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries

We study the optimal communication cost for computing a full conjunctive query Q over p distributed servers. Two prior results were known. First, for one-round algorithms over skew-free data the optimal communication cost per server is m/p^(1/tau*), where m is the size of the largest input relation, and tau* is the fractional vertex covering number of the query hypergraph. Second, for multi-round algorithms and unrestricted database instances, it was shown that any algorithm requires at least m/p^(1/rho*) communication cost per server, where rho* is the fractional edge covering number of the query hypergraph; but no matching algorithms were known for this case (except for two restricted queries: chains and cycles). In this paper we describe a multi-round algorithm that computes any query with load m/p^(1/rho*) per server, in the case when all input relations are binary. Thus, we prove this to be the optimal load for all queries over binary input relations. Our algorithm represents a non-trivial extension of previous algorithms for chains and cycles, and exploits some unique properties of graphs, which no longer hold for hyper-graphs.

[1]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[2]  Jeffrey D. Ullman,et al.  Optimizing joins in a map-reduce environment , 2010, EDBT '10.

[3]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[4]  Dan Suciu,et al.  Worst-Case Optimal Algorithms for Parallel Query Processing , 2016, ICDT.

[5]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[6]  Dan Suciu,et al.  Skew in parallel query processing , 2014, PODS.

[7]  E. Scheinerman,et al.  Fractional Graph Theory: A Rational Approach to the Theory of Graphs , 1997 .

[8]  Thomas Schwentick,et al.  Parallel-Correctness and Transferability for Conjunctive Queries , 2015, PODS.

[9]  Todd L. Veldhuizen,et al.  Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[10]  Ke Yi,et al.  Towards a Worst-Case I/O-Optimal Algorithm for Acyclic Joins , 2016, PODS.

[11]  Georg Gottlob,et al.  Size and treewidth bounds for conjunctive queries , 2009, JACM.

[12]  William R. Pulleyblank,et al.  König-Egerváry graphs, 2-bicritical graphs and fractional matchings , 1989, Discret. Appl. Math..

[13]  Thomas Schwentick,et al.  Parallel-Correctness and Transferability for Conjunctive Queries , 2014, J. ACM.

[14]  Q NgoHung,et al.  Skew strikes back , 2014 .

[15]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[16]  Dan Suciu,et al.  Parallel evaluation of conjunctive queries , 2011, PODS.