论文信息 - Quark-X: An Efficient Top-K Processing Framework for RDF Quad Stores

Quark-X: An Efficient Top-K Processing Framework for RDF Quad Stores

There is a growing trend towards enriching the RDF content from its classical Subject-Predicate-Object triple form to an annotated representation which can model richer relationships such as including fact provenance, fact confidence, higher-order relationships and so on. One of the recommended ways to achieve this is to use reification and represent it as N-Quads "or simply quads" where an additional identifier is associated with the entire RDF statement which can then be used to add further annotations. A typical use of such annotations is to have quantifiable confidence values to be attached to facts. In such settings, it is important to support efficient top-k queries, typically over user-defined ranking functions containing sentence level confidence values in addition to other quantifiable values in the database. In this paper, we present Quark-X, an RDF-store and SPARQL processing system for reified RDF data represented in the form of quads. This paper presents the overall architecture of our system -- illustrating the modifications which need to be made to a native quad store for it to process top-k queries. In Quark-X, we propose indexing and query processing techniques for making top-k querying efficient. In addition, we present the results of a comprehensive empirical evaluation of our system over Yago2S and DBpedia datasets. Our performance study shows that the proposed method achieves one to two order of magnitude speed-up over baseline solutions.

Srikanta J. Bedathur | Debajyoti Bera | Jyoti Leeka | Medha Atre

[1] Torsten Suel,et al. Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[2] Jens Lehmann,et al. DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[3] Mohamed F. Mokbel,et al. RDF Data-Centric Storage , 2009, 2009 IEEE International Conference on Web Services.

[4] Andreas Harth,et al. Top-k Linked Data Query Processing , 2012, ESWC.

[5] Kevin Wilkinson,et al. Jena Property Table Implementation , 2006 .

[6] Shima Zahmatkesh. Retrieval of the Most Relevant Combinations of Data Published in Heterogeneous Distributed Datasets on the Web , 2014, DC@ISWC.

[7] Gerhard Weikum,et al. KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[8] Gerhard Weikum,et al. RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[9] Gerhard Weikum,et al. Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[10] Walid G. Aref,et al. Rank-aware query optimization , 2004, SIGMOD '04.

[11] Jeff Heflin,et al. LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[12] M. Tamer Özsu,et al. Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[13] Srikanta J. Bedathur,et al. RQ-RDF-3X: Going beyond triplestores , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[14] Amit P. Sheth,et al. Don't like RDF reification?: making statements about statements using singleton property , 2014, WWW.

[15] Emanuele Della Valle,et al. Efficient Execution of Top-K SPARQL Queries , 2012, SEMWEB.

[16] Guido Moerkotte,et al. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[17] J. S. Saini,et al. Adaptive Query Processing , 2006 .

[18] Martin L. Kersten,et al. Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[19] Peter A. Boncz,et al. Advances in Large-Scale RDF Data Management , 2014, Linked Open Data.

[20] Peter Kulchyski. and , 2015 .

[21] Sören Auer,et al. Linked Open Data -- Creating Knowledge Out of Interlinked Data , 2014, Lecture Notes in Computer Science.

[22] Hicham G. Elmongui,et al. Adaptive rank-aware query optimization in relational databases , 2006, TODS.

[23] Gerhard Weikum,et al. KOGNAC: Efficient Encoding of Large Knowledge Graphs , 2016, IJCAI.

[24] Georg Lausen,et al. SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[25] Lei Zou,et al. Top-k queries on RDF graphs , 2015, Inf. Sci..

[26] Michael Sintek,et al. RDFBroker: A Signature-Based High-Performance RDF Store , 2006, ESWC.

[27] Jiawei Han,et al. Progressive and selective merge: computing top-k with ad-hoc ranking functions , 2007, SIGMOD '07.

[28] Christian Bizer,et al. The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[29] Abraham Bernstein,et al. Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[30] F IlyasIhab,et al. A survey of top-k query processing techniques in relational database systems , 2008 .

[31] Peter J. Haas,et al. Ripple joins for online aggregation , 1999, SIGMOD '99.

[32] R. Varshney,et al. Supporting top-k join queries in relational databases , 2011 .

[33] Xuhua Ding,et al. Efficient processing of exact top-k queries over disk-resident sorted lists , 2010, The VLDB Journal.

[34] Georg Lausen,et al. SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[35] Gerhard Weikum,et al. YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[36] Daniel J. Abadi,et al. Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.