R2DF framework for ranked path queries over weighted RDF graphs

Resource Description Framework (RDF) is a semantic web specification that aims to support conceptual modeling of information about resources in the form of a triples of facts. In this paper, we note that, although RDF provides mechanisms to encode meta-information (such as source, trust, or certainty) about facts recorded in the knowledge base, existing RDF query languages and RDF stores fail to support key primitives needed in a large class of knowledge applications which associate utilities or costs on the available knowledge statements. To address this shortcoming, we propose a novel R2DF framework for utility ranked resource descriptions. We first propose a simple ranked RDF (R2DF) specification to enhance RDF triples with an application specific weight (e.g. cost). We then propose a SPARankQL query language specification, which includes a set of novel primitives on top of the SPARQL language to express top-k queries using traditional query patterns as well as novel flexible path predicates. An extended query processor engine, AR2Q, leverages novel index structures to support efficient ranked path search and includes query optimization strategies based on two key metrics: (a) proximity and (b) sub-result inter-arrival time. Experiments show that the use of these two metrics has significant impacts on the performance of top-k queries over R2DF graphs: in particular, the proximity measure helps reduce the number of path matches that need to be considered, whereas the inter-arrival measure reduces the overall execution time significantly especially when used along with proximity. The proposed strategies help obtain query plans close to optimal.

[1]  Luigi Di Caro,et al.  CoSeNa: a context-based search and navigation system , 2009, MEDES.

[2]  K. Selçuk Candan,et al.  Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs , 2007, VLDB.

[3]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[4]  Rada Chirkova,et al.  Efficient algorithms for exact ranked twig-pattern matching over graphs , 2008, SIGMOD Conference.

[5]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[6]  Stanislav Barton,et al.  Designing Indexing Structure for Discovering Relationships in RDF Graphs , 2004, DATESO.

[7]  Jiawei Han,et al.  Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation , 2006 .

[8]  Amit P. Sheth,et al.  The ρ operator: discovering and ranking associations on the semantic web , 2002, SGMD.

[9]  Jong Wook Kim,et al.  Skip-and-prune: cosine-based top-k query processing for efficient context-sensitive document retrieval , 2009, SIGMOD Conference.

[10]  Kevin Chen-Chuan Chang,et al.  RankSQL: Supporting Ranking Queries in Relational Database Management Systems , 2005, VLDB.

[11]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[12]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[13]  K. Selçuk Candan,et al.  FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources , 2007, SIGMOD '07.

[14]  Marta M. B. Pascoal,et al.  A new implementation of Yen’s ranking loopless paths algorithm , 2003, 4OR.

[15]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[16]  Marko A. Rodriguez,et al.  A path algebra for multi-relational graphs , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[17]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[18]  Krys J. Kochut,et al.  SPARQLeR: Extended Sparql for Semantic Association Discovery , 2007, ESWC.

[19]  Olaf Hartig,et al.  The SPARQL Query Graph Model for Query Optimization , 2007, ESWC.

[20]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[21]  Luis Gravano,et al.  Optimizing top-k selection queries over multimedia repositories , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Jeffrey Xu Yu,et al.  Fast Reachability Query Processing , 2006, DASFAA.

[23]  Pablo Castells,et al.  A Multi-Purpose Ontology-Based Approach for Personalized Content Filtering and Retrieval , 2006, 2006 First International Workshop on Semantic Media Adaptation and Personalization (SMAP'06).

[24]  W3C XML Path Language , 2009, Encyclopedia of Database Systems.

[25]  Olaf Hartig,et al.  Querying Trust in RDF Data with tSPARQL , 2009, ESWC.

[26]  Amit P. Sheth,et al.  SPARQ2L: towards support for subgraph extraction queries in rdf databases , 2007, WWW '07.

[27]  K. Selçuk Candan,et al.  Using Random Walks for Mining Web Document Associations , 2000, PAKDD.

[28]  Amit P. Sheth,et al.  Ρ-Queries: enabling querying for semantic associations on the semantic web , 2003, WWW '03.

[29]  Christos Faloutsos,et al.  Fast direction-aware proximity for graph mining , 2007, KDD '07.

[30]  Tom Adams,et al.  mulgara semantic store , 2003 .

[31]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[32]  Dimitrios Gunopulos,et al.  Anytime Measures for Top-k Algorithms , 2007, VLDB.

[33]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS.

[34]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[35]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[36]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[37]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[38]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.