XML-based RDF data management for efficient query processing

The Semantic Web, which represents a web of knowledge, offers new opportunities to search for knowledge and information. To harvest such search power requires robust and scalable data repositories that can store RDF data and support efficient evaluation of SPARQL queries. Most of the existing RDF storage techniques rely on relation model and relational database technologies for these tasks. They either keep the RDF data as triples, or decompose it into multiple relations. The mis-match between the graph model of the RDF data and the rigid 2D tables of relational model jeopardizes the scalability of such repositories and frequently renders a repository inefficient for some types of data and queries. We propose to decompose RDF graph into a forest of semantically correlated XML trees, store them in an XML repository and rewrite SPARQL queries into XPath/XQuery queries to be evaluated in the XML repository. In this paper, we discuss the basic idea of RDF-to-XML decomposition and the criteria of such decomposition in term of correctness, redundancy and query efficiency, then propose two RDF-to-XML decomposition algorithms based on these criteria. Our experimental evaluation results illustrate that our approach is capable of improving both the storage efficiency and query processing efficiency compared to the existing RDF techniques.

[1]  Frank van Harmelen,et al.  Sesame: An Architecture for Storin gand Querying RDF Data and Schema Information , 2003, Spinning the Semantic Web.

[2]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[3]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[4]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[5]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[6]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[7]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[8]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[9]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[10]  Dirk Van Gucht,et al.  Trie Indexes for Efficient XML Query Evaluation , 2008, WebDB.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Vassilis Christophides,et al.  The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , 2001, SemWeb.

[13]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  Georg Lausen,et al.  An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario , 2008, SEMWEB.

[15]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[16]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[17]  Jeremy J. Carroll,et al.  RDF triples in XML , 2004, WWW Alt. '04.

[18]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[19]  George H. L. Fletcher,et al.  Scalable indexing of RDF graphs for efficient join processing , 2009, CIKM.

[20]  Norman Walsh RDF Twig: accessing RDF graphs in XSLT , 2003, Extreme Markup Languages®.

[21]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..