A Framework for Top-K Queries over Weighted RDF Graphs

The Resource Description Framework (RDF) is a specification that aims to support the conceptual modeling of metadata or information about resources in the form of a directed graph composed of triples of knowledge (facts). RDF also provides mechanisms to encode meta-information (such as source, trust, and certainty) about facts already existing in a knowledge base through a process called reification. In this thesis, an extension to the current RDF specification is proposed in order to enhance RDF triples with an application specific weight (cost). Unlike reification, this extension treats these additional weights as first class knowledge attributes in the RDF model, which can be leveraged by the underlying query engine. Additionally, current RDF query languages, such as SPARQL, have a limited expressive power which limits the capabilities of applications that use them. Plus, even in the presence of language extensions, current RDF stores could not provide methods and tools to process extended queries in an efficient and effective way. To overcome these limitations, a set of novel primitives for the SPARQL language is proposed to express Top-k queries using traditional query patterns as well as novel predicates inspired by those from the XPath language. Plus, an extended query processor engine is developed to support efficient ranked path search, join, and indexing.

[1]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[2]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[3]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[4]  Krys J. Kochut,et al.  SPARQLeR: Extended Sparql for Semantic Association Discovery , 2007, ESWC.

[5]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[6]  Amit P. Sheth,et al.  Context-Aware Semantic Association Ranking , 2003, SWDB.

[7]  K. Selçuk Candan,et al.  Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs , 2007, VLDB.

[8]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[9]  Marta M. B. Pascoal,et al.  A new implementation of Yen’s ranking loopless paths algorithm , 2003, 4OR.

[10]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[11]  Rada Chirkova,et al.  Efficient algorithms for exact ranked twig-pattern matching over graphs , 2008, SIGMOD Conference.

[12]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[13]  Tom Adams,et al.  mulgara semantic store , 2003 .

[14]  Michel Scholl,et al.  Gram: a graph data model and query languages , 1992, ECHT '92.

[15]  Ulf Leser,et al.  A query language for biological networks , 2005, ECCB/JBI.

[16]  Stanislav Barton,et al.  Designing Indexing Structure for Discovering Relationships in RDF Graphs , 2004, DATESO.

[17]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[18]  Jiawei Han,et al.  Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation , 2006 .

[19]  Luigi Di Caro,et al.  CoSeNa: a context-based search and navigation system , 2009, MEDES.

[20]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[21]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[22]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[23]  Luis Gravano,et al.  Optimizing top-k selection queries over multimedia repositories , 2004, IEEE Transactions on Knowledge and Data Engineering.

[24]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[25]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[26]  W3C XML Path Language , 2009, Encyclopedia of Database Systems.

[27]  Olaf Hartig,et al.  Querying Trust in RDF Data with tSPARQL , 2009, ESWC.

[28]  Amit P. Sheth,et al.  The ρ operator: discovering and ranking associations on the semantic web , 2002, SGMD.

[29]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[30]  Xiaoshuang Xu,et al.  Path-partitioned encoding supports wildcard-awareness twig queries , 2009 .

[31]  Amit P. Sheth,et al.  Ρ-Queries: enabling querying for semantic associations on the semantic web , 2003, WWW '03.

[32]  Amit P. Sheth,et al.  SPARQ2L: towards support for subgraph extraction queries in rdf databases , 2007, WWW '07.

[33]  Philip S. Yu,et al.  Fast Computation of Reachability Labeling for Large Graphs , 2006, EDBT.

[34]  Jong Wook Kim,et al.  Skip-and-prune: cosine-based top-k query processing for efficient context-sensitive document retrieval , 2009, SIGMOD Conference.

[35]  Ralf Hartmut Güting,et al.  GraphDB: Modeling and Querying Graphs in Databases , 1994, VLDB.

[36]  Kevin Chen-Chuan Chang,et al.  RankSQL: Supporting Ranking Queries in Relational Database Management Systems , 2005, VLDB.

[37]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[38]  K. Selçuk Candan,et al.  FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources , 2007, SIGMOD '07.

[39]  K. Selçuk Candan,et al.  Integrating and querying taxonomies with quest in the presence of conflicts , 2007, SIGMOD '07.

[40]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[41]  Amit P. Sheth,et al.  SemRank: ranking complex relationship search results on the semantic web , 2005, WWW '05.

[42]  K. Selçuk Candan,et al.  Similarity-based ranking and query processing in multimedia databases , 2000, Data Knowl. Eng..

[43]  Amit P. Sheth,et al.  Discovering and Ranking Semantic Associations over a Large RDF Metabase , 2004, VLDB.

[44]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[45]  RalfHiutmut Gtiting,et al.  GraphDB : Modeling and Querying Graphs in Databases , 1998 .

[46]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[47]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[48]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[49]  Christos Faloutsos,et al.  Fast direction-aware proximity for graph mining , 2007, KDD '07.

[50]  Jeffrey Xu Yu,et al.  Fast Reachability Query Processing , 2006, DASFAA.

[51]  Pablo Castells,et al.  A Multi-Purpose Ontology-Based Approach for Personalized Content Filtering and Retrieval , 2006, 2006 First International Workshop on Semantic Media Adaptation and Personalization (SMAP'06).

[52]  Luc De Raedt,et al.  A query language for analyzing networks , 2009, CIKM.

[53]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[54]  Dimitrios Gunopulos,et al.  Anytime Measures for Top-k Algorithms , 2007, VLDB.

[55]  K. Selçuk Candan,et al.  Reasoning for Web document associations and its applications in site map construction , 2002, Data Knowl. Eng..