Partitioned Indexes for Entity Search over RDF Knowledge Bases

The rapid growth of RDF data in RDF knowledge bases calls for efficient query processing techniques. This paper focuses on the star-style SPARQL join queries, which is very common when users want to search information of entities from RDF knowledge bases. We observe that the computational cost of such queries mainly comes from loading a large portion of predicate-ahead indexes. We therefore propose to partition the whole RDF knowledge bases based on the schema of individual entities, so that only entities of similar schemas are allocated into the same cluster. Such a partitioning strategy generates a pruning mechanism that effectively isolate the correlations of partitions and the queries. Consequently, queries are only conducted over a small number of partitions with small predicate-ahead indexes. Experiments over a large real-life RDF data set show the significant performance improvements achieved by our partitioned indexing techniques.

[1]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[2]  Tim Berners-Lee,et al.  Linked data on the web (LDOW2008) , 2008, WWW.

[3]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[4]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[5]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[6]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[7]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[8]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[9]  V. S. Subrahmanian,et al.  DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases , 2009, SEMWEB.

[10]  John Benjafield,et al.  Cognition, 3rd ed. , 2007 .

[11]  Simon Schenk,et al.  Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-Joins , 2008, SEMWEB.

[12]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[13]  Frank van Harmelen,et al.  Sesame: An Architecture for Storin gand Querying RDF Data and Schema Information , 2003, Spinning the Semantic Web.

[14]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[15]  Manolis Koubarakis,et al.  Atlas: Storing, updating and querying RDF(S) data on top of DHTs , 2010, J. Web Semant..

[16]  Liyang Yu Linked Open Data , 2011 .

[17]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[18]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[19]  Zhihua Zhang,et al.  Matrix-Variate Dirichlet Process Mixture Models , 2010, AISTATS.

[20]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.