Quark-X: An Efficient Top-K Processing Framework for RDF Quad Stores

There is a growing trend towards enriching the RDF content from its classical Subject-Predicate-Object triple form to an annotated representation which can model richer relationships such as including fact provenance, fact confidence, higher-order relationships and so on. One of the recommended ways to achieve this is to use reification and represent it as N-Quads "or simply quads" where an additional identifier is associated with the entire RDF statement which can then be used to add further annotations. A typical use of such annotations is to have quantifiable confidence values to be attached to facts. In such settings, it is important to support efficient top-k queries, typically over user-defined ranking functions containing sentence level confidence values in addition to other quantifiable values in the database. In this paper, we present Quark-X, an RDF-store and SPARQL processing system for reified RDF data represented in the form of quads. This paper presents the overall architecture of our system -- illustrating the modifications which need to be made to a native quad store for it to process top-k queries. In Quark-X, we propose indexing and query processing techniques for making top-k querying efficient. In addition, we present the results of a comprehensive empirical evaluation of our system over Yago2S and DBpedia datasets. Our performance study shows that the proposed method achieves one to two order of magnitude speed-up over baseline solutions.

[1]  Torsten Suel,et al.  Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[2]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[3]  Mohamed F. Mokbel,et al.  RDF Data-Centric Storage , 2009, 2009 IEEE International Conference on Web Services.

[4]  Andreas Harth,et al.  Top-k Linked Data Query Processing , 2012, ESWC.

[5]  Kevin Wilkinson,et al.  Jena Property Table Implementation , 2006 .

[6]  Shima Zahmatkesh Retrieval of the Most Relevant Combinations of Data Published in Heterogeneous Distributed Datasets on the Web , 2014, DC@ISWC.

[7]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[8]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[9]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[10]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[11]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[12]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[13]  Srikanta J. Bedathur,et al.  RQ-RDF-3X: Going beyond triplestores , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[14]  Amit P. Sheth,et al.  Don't like RDF reification?: making statements about statements using singleton property , 2014, WWW.

[15]  Emanuele Della Valle,et al.  Efficient Execution of Top-K SPARQL Queries , 2012, SEMWEB.

[16]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[17]  J. S. Saini,et al.  Adaptive Query Processing , 2006 .

[18]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[19]  Peter A. Boncz,et al.  Advances in Large-Scale RDF Data Management , 2014, Linked Open Data.

[20]  Peter Kulchyski and , 2015 .

[21]  Sören Auer,et al.  Linked Open Data -- Creating Knowledge Out of Interlinked Data , 2014, Lecture Notes in Computer Science.

[22]  Hicham G. Elmongui,et al.  Adaptive rank-aware query optimization in relational databases , 2006, TODS.

[23]  Gerhard Weikum,et al.  KOGNAC: Efficient Encoding of Large Knowledge Graphs , 2016, IJCAI.

[24]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[25]  Lei Zou,et al.  Top-k queries on RDF graphs , 2015, Inf. Sci..

[26]  Michael Sintek,et al.  RDFBroker: A Signature-Based High-Performance RDF Store , 2006, ESWC.

[27]  Jiawei Han,et al.  Progressive and selective merge: computing top-k with ad-hoc ranking functions , 2007, SIGMOD '07.

[28]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[29]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[30]  F IlyasIhab,et al.  A survey of top-k query processing techniques in relational database systems , 2008 .

[31]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[32]  R. Varshney,et al.  Supporting top-k join queries in relational databases , 2011 .

[33]  Xuhua Ding,et al.  Efficient processing of exact top-k queries over disk-resident sorted lists , 2010, The VLDB Journal.

[34]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[35]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[36]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.