Worst-Case Optimal Graph Joins in Almost No Space

We present an indexing scheme that supports worst-case optimal (wco) joins over graphs within compact space. Supporting all possible wco joins using conventional data structures - based on B(+)-Trees, tries, etc. - requires 6 index orders in the case of graphs represented as triples. We rather propose a form of index, which we call a ring, that indexes each triple as a set of cyclic bidirectional strings of length 3. Rather than maintaining 6 orderings, we can use one ring to index them all. This ring replaces the graph and uses only sublinear extra space on top of the graph; in order words, the ring supports worst-case optimal graph joins in almost no space beyond storing the graph itself. We perform experiments using our representation to index a large graph (Wikidata) in memory, over which wco join algorithms are implemented. Our experiments show that the ring offers the best overall performance for query times while using only a small fraction of the space when compared with several state-of-the-art approaches.

[1]  Juha Kärkkäinen,et al.  Linear-time String Indexing and Analysis in Small Space , 2016, ACM Trans. Algorithms.

[2]  Markus Krötzsch,et al.  Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph , 2018, SEMWEB.

[3]  Nieves R. Brisaboa,et al.  Compressed representation of dynamic binary relations with applications , 2017, Inf. Syst..

[4]  Gonzalo Navarro,et al.  Colored range queries and document retrieval , 2010, Theor. Comput. Sci..

[5]  Gonzalo Navarro,et al.  Optimal Joins using Compact Data Structures , 2019, ICDT.

[6]  Nieves R. Brisaboa,et al.  A Compact RDF Store Using Suffix Arrays , 2015, SPIRE.

[7]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[8]  Wim Martens,et al.  Navigating the Maze of Wikidata Query Logs , 2019, WWW.

[9]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[10]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[11]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[12]  Kunihiko Sadakane,et al.  New text indexing functionalities of the compressed suffix arrays , 2003, J. Algorithms.

[13]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[14]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[15]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[16]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[17]  Dan Suciu,et al.  What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another? , 2016, PODS.

[18]  Jeffrey Scott Vitter,et al.  Fast Construction of Wavelet Trees , 2014, SPIRE.

[19]  Nieves R. Brisaboa,et al.  Practical compressed string dictionaries , 2016, Inf. Syst..

[20]  Bryan B. Thompson,et al.  The Bigdata® RDF Graph Database , 2014, Linked Data Management.

[21]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[22]  Jeffrey Scott Vitter,et al.  Dynamic Data Structures for Document Collections and Graphs , 2015, PODS.

[23]  Gonzalo Navarro,et al.  The wavelet matrix: An efficient wavelet tree for large alphabets , 2015, Inf. Syst..

[24]  Aidan Hogan,et al.  Canonicalisation of Monotone SPARQL Queries , 2018, International Semantic Web Conference.

[25]  Atri Rudra,et al.  Join Processing for Graph Patterns: An Old Dog with New Tricks , 2015, GRADES@SIGMOD/PODS.

[26]  Dan Suciu,et al.  Boolean Tensor Decomposition for Conjunctive Queries with Negation , 2017, ICDT.

[27]  Alfons Kemper,et al.  Adopting worst-case optimal joins in relational database systems , 2020, Proc. VLDB Endow..

[28]  Hung Q. Ngo,et al.  Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems , 2018, PODS.

[29]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[30]  Todd L. Veldhuizen,et al.  Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[31]  Gonzalo Navarro,et al.  Wavelet trees for all , 2012, J. Discrete Algorithms.

[32]  Benny Kimelfeld,et al.  Flexible Caching in Trie Joins , 2016, EDBT.

[33]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[34]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[35]  Gonzalo Navarro,et al.  New algorithms on wavelet trees and applications to information retrieval , 2010, Theor. Comput. Sci..

[36]  Semih Salihoglu,et al.  Box Covers and Domain Orderings for Beyond Worst-Case Join Processing , 2019, ArXiv.

[37]  Antonio Restivo,et al.  An extension of the Burrows-Wheeler Transform , 2007, Theor. Comput. Sci..

[38]  Gonzalo Navarro,et al.  Dynamic entropy-compressed sequences and full-text indexes , 2006, TALG.

[39]  Wolfgang Gatterbauer,et al.  Optimal Join Algorithms Meet Top-k , 2020, SIGMOD Conference.

[40]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[41]  Aidan Hogan,et al.  A Worst-Case Optimal Join Algorithm for SPARQL , 2019, SEMWEB.

[42]  David Richard Clark,et al.  Compact pat trees , 1998 .

[43]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[44]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[45]  Amine Mhedhbi,et al.  Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins , 2019, Proc. VLDB Endow..

[46]  Giovanna Rosone,et al.  Lightweight algorithms for constructing and inverting the BWT of string collections , 2013, Theor. Comput. Sci..

[47]  Nieves R. Brisaboa,et al.  Compressed vertical partitioning for efficient RDF management , 2014, Knowledge and Information Systems.