R3F: RDF triple filtering method for efficient SPARQL query processing

With the rapid growth in the amount of graph-structured Resource Description Framework (RDF) data, SPARQL query processing has received significant attention. The most important part of SPARQL query processing is its method of subgraph pattern matching. For this, most RDF stores use relation-based approaches, which can produce a vast number of redundant intermediate results during query evaluation. In order to address this problem, we propose an RDF Triple Filtering (R3F) method that exploits the graph-structural information of RDF data. We design a path-based index called the RDF Path index (RP-index) to efficiently provide filter data for the triple filtering. We also propose a relational operator called the RDF Filter (RFLT) that can conduct the triple filtering with little overhead compared to the original query processing. Through comprehensive experiments on large-scale RDF datasets, we demonstrate that R3F can effectively and efficiently reduce the number of redundant intermediate results and improve the query performance.

[1]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[4]  V. S. Subrahmanian,et al.  DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases , 2009, SEMWEB.

[5]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[6]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[7]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[8]  Henning Köhler Estimating set intersection using small samples , 2010, ACSC.

[9]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[10]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[11]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[12]  Thanh Tran Structure Index for RDF Data , 2010 .

[13]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[14]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[15]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[16]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[17]  Kam-Fai Wong,et al.  Answering XML Queries Using Path-Based Indexes: A Survey , 2006, World Wide Web.

[18]  Shiyong Lu,et al.  Semantics preserving SPARQL-to-SQL translation , 2009, Data Knowl. Eng..

[19]  Amit P. Sheth,et al.  Graph Summaries for Subgraph Frequency Estimation , 2008, ESWC.

[20]  Philip S. Yu,et al.  On applying hash filters to improving the execution of multi-join queries , 1997, The VLDB Journal.

[21]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[22]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[23]  Andy Seaborne,et al.  Clustered TDB: A Clustered Triple Store for Jena , 2008 .

[24]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[25]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[26]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[27]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[28]  Guido Moerkotte,et al.  Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors , 2009, Proc. VLDB Endow..

[29]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[30]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[31]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[32]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[33]  Hao He,et al.  Multiresolution indexing of XML for frequent queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[34]  Rada Chirkova,et al.  Efficiently Querying Large XML Data Repositories: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[35]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[36]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[37]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[38]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[39]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[40]  David Maier,et al.  Magic sets and other strange ways to implement logic programs (extended abstract) , 1985, PODS '86.

[41]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[42]  Philip S. Yu,et al.  Graph indexing based on discriminative frequent structure analysis , 2005, TODS.

[43]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[44]  Hyoung-Joo Kim,et al.  RP-Filter: A Path-Based Triple Filtering Method for Efficient SPARQL Query Processing , 2011, JIST.

[45]  Roberto De Virgilio,et al.  A scalable and extensible framework for query answering over RDF , 2011, World Wide Web.

[46]  Chengfei Liu,et al.  Approximating query answering on RDF databases , 2011, World Wide Web.

[47]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[48]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[49]  Raghu Ramakrishnan,et al.  Review - Magic Sets and Other Strange Ways to Implement Logic Programs , 1999, ACM SIGMOD Digit. Rev..

[50]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[51]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[52]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[53]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..