An Efficient Index for RDF Query Containment

Query containment is a fundamental operation used to expedite query processing in view materialisation and query caching techniques. Since query containment has been shown to be NP-complete for arbitrary conjunctive queries on RDF graphs, we introduce a simpler form of conjunctive queries that we name f-graph queries. We first show that containment checking for f-graph queries can be solved in polynomial time. Based on this observation, we propose a novel indexing structure, named mv-index, that allows for fast containment checking between a single f-graph query and an arbitrary number of stored queries. Search is performed in polynomial time in the combined size of the query and the index. We then show how our algorithms and structures can be extended for arbitrary conjunctive queries on RDF graphs by introducing f-graph witnesses, i.e., f-graph representatives of conjunctive queries. F-graph witnesses have the following interesting property, a conjunctive query for RDF graphs is contained in another query only if its corresponding f-graph witness is also contained in it. The latter allows to use our indexing structure for the general case of conjunctive query containment. This translates in practice to microseconds or less for the containment test against hundreds of thousands of queries that are indexed within our structure.

[1]  Vassilis Christophides,et al.  Querying the Semantic Web with RQL , 2003, Comput. Networks.

[2]  Stijn Vansummeren,et al.  What are real SPARQL queries like? , 2011, SWIM '11.

[3]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[4]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[5]  Brian McBride,et al.  Jena: Implementing the RDF Model and Syntax Specification , 2001, SemWeb.

[6]  Vasilis Vassalos,et al.  Answering Queries Using Views , 2009, Encyclopedia of Database Systems.

[7]  J. Neyman Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability , 1937 .

[8]  Peter Triantafillou,et al.  GraphCache: A Caching System for Graph Queries , 2017, EDBT.

[9]  Alberto O. Mendelzon,et al.  Foundations of semantic web databases , 2004, PODS.

[10]  Yannis Kotidis,et al.  Using entropy metrics for pruning very large graph cubes , 2019, Inf. Syst..

[11]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[12]  Jorge Pérez,et al.  Static analysis and optimization of semantic web queries , 2012, PODS '12.

[13]  David S. Johnson,et al.  Optimizing Conjunctive Queries that Contain Untyped Variables , 1983, SIAM J. Comput..

[14]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[15]  Dennis Shasha,et al.  GRAPES: A Software for Parallel Searching on Biological Graphs Targeting Multi-Core Architectures , 2013, PloS one.

[16]  Giorgos Stoilos,et al.  Expressive reasoning with horn rules and fuzzy description logics , 2010, Knowledge and Information Systems.

[17]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[18]  Krithi Ramamritham,et al.  Materialized view selection and maintenance using multi-query optimization , 2000, SIGMOD '01.

[19]  Reinhard Pichler,et al.  Containment and equivalence of well-designed SPARQL , 2014, PODS.

[20]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[21]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[22]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[23]  Yannis Kotidis,et al.  Graph Analytics on Massive Collections of Small Graphs , 2014, EDBT.

[24]  Nikos Mamoulis,et al.  Extended Characteristic Sets: Graph Indexing for SPARQL Query Optimization , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[25]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[26]  Evgeny Kharlamov,et al.  Faceted search over RDF-based knowledge graphs , 2016, J. Web Semant..

[27]  Peter Triantafillou,et al.  Indexing Query Graphs to Speedup Graph Query Processing , 2016, EDBT.

[28]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[29]  B. Motik,et al.  RDFox: A Highly-Scalable RDF Store , 2015, SEMWEB.

[30]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[31]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[32]  Jérôme Euzenat,et al.  SPARQL Query Containment Under SHI Axioms , 2012, AAAI.

[33]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[34]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[35]  Anand Rajaraman,et al.  Conjunctive query containment revisited: Extended Abstract , 1997, ICDT 1997.

[36]  Shuai Ma,et al.  GC: A Graph Caching System for Subgraph/Supergraph Queries , 2018, Proc. VLDB Endow..

[37]  Claudio Gutiérrez,et al.  The Expressive Power of SPARQL , 2008, SEMWEB.

[38]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[39]  Manolis Gergatsoulis,et al.  Query containment under bag and bag-set semantics , 2010, Inf. Process. Lett..

[40]  Wim Martens,et al.  An Analytical Study of Large SPARQL Query Logs , 2017, Proc. VLDB Endow..

[41]  Evgeny Kharlamov,et al.  Answering Queries using Views over Probabilistic XML: Complexity and Tractability , 2012, Proc. VLDB Endow..

[42]  Min Wang,et al.  EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[43]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[44]  Dimitrios Tsoumakos,et al.  Graph-Aware, Workload-Adaptive SPARQL Query Caching , 2015, SIGMOD Conference.

[45]  Rafael Peñaloza,et al.  Answering Fuzzy Conjunctive Queries Over Finitely Valued Fuzzy Ontologies , 2015, Journal on Data Semantics.

[46]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[47]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[48]  Jérôme Euzenat,et al.  SPARQL Query Containment under RDFS Entailment Regime , 2012, IJCAR.

[49]  Markus Krötzsch,et al.  Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph , 2018, SEMWEB.

[50]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[51]  Ian Horrocks,et al.  Towards Analytics Aware Ontology Based Access to Static and Streaming Data , 2016, SEMWEB.

[52]  Konstantinos Morfonios,et al.  ROLAP implementations of the data cube , 2007, CSUR.

[53]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[54]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[55]  Ian Horrocks,et al.  OptiqueVQS: A visual query system over ontologies for industry , 2018, Semantic Web.

[56]  Raghu Ramakrishnan,et al.  Containment of conjunctive queries: beyond relations as sets , 1995, TODS.

[57]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[58]  François Goasdoué,et al.  View Selection in Semantic Web Databases , 2011, Proc. VLDB Endow..

[59]  Evgeny Kharlamov,et al.  An ontology-mediated analytics-aware approach to support monitoring and diagnostics of static and streaming data , 2019, J. Web Semant..

[60]  Sebastian Rudolph,et al.  Managing Structured and Semistructured RDF Data Using Structure Indexes , 2013, IEEE Transactions on Knowledge and Data Engineering.

[61]  Axel Polleres,et al.  From SPARQL to rules (and back) , 2007, WWW '07.

[62]  Vassilis Christophides,et al.  Containment and Minimization of RDF/S Query Patterns , 2005, SEMWEB.

[63]  Anand Rajaraman,et al.  Conjunctive query containment revisited , 1997, Theor. Comput. Sci..

[64]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[65]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[66]  Yannis Kotidis,et al.  Digree: Building A Distributed Graph Processing Engine out of Single-node Graph Database Installations , 2018, SGMD.

[67]  Panos Constantopoulos,et al.  Optimizing Query Shortcuts in RDF Databases , 2011, ESWC.

[68]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[69]  Gerhard Weikum,et al.  x-RDF-3X , 2010, Proc. VLDB Endow..

[70]  Nick Roussopoulos,et al.  A case for dynamic view management , 2001, ACM Trans. Database Syst..

[71]  Georg Gottlob,et al.  The complexity of acyclic conjunctive queries , 2001, JACM.

[72]  Yannis E. Ioannidis,et al.  Real time processing of streaming and static information , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[73]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[74]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[75]  Karsten Klein,et al.  CT-index: Fingerprint-based graph indexing combining cycles and trees , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[76]  Ian Horrocks,et al.  Ontology-Based Integration of Streaming and Static Relational Data with Optique , 2016, SIGMOD Conference.