k-nearest keyword search in RDF graphs

Resource Description Framework (RDF) has been widely used as a W3C standard to describe the resource information in the Semantic Web. A standard SPARQL query over RDF data requires query issuers to fully understand the domain knowledge of the data. Because of this fact, SPARQL queries over RDF data are not flexible and it is difficult for non-experts to create queries without knowing the underlying data domain. Inspired by this problem, in this paper, we propose and tackle a novel and important query type, namely k-nearest keyword (k-NK) query, over a large RDF graph. Specifically, a k-NK query obtains k closest pairs of vertices, (v"i,u"i), in the RDF graph, that contain two given keywords q and w, respectively, such that u"i is the nearest vertex of v"i that contains the keyword w. To efficiently answer k-NK queries, we design effective pruning methods for RDF graphs both with and without schema, which can greatly reduce the query search space. Moreover, to facilitate our pruning strategies, we propose effective indexing mechanisms on RDF graphs with/without schema to enable fast k-NK query answering. Through extensive experiments, we demonstrate the efficiency and effectiveness of our proposed k-NK query processing approaches.

[1]  Ran Raz,et al.  A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP , 1997, STOC '97.

[2]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[3]  Kyriakos Mouratidis,et al.  Group nearest neighbor queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[4]  Yufei Tao,et al.  Query Processing in Spatial Network Databases , 2003, VLDB.

[5]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[6]  Claudio Gutiérrez,et al.  Querying RDF Data from a Graph Database Perspective , 2005, ESWC.

[7]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[8]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[9]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[10]  Stephen T. Hedetniemi,et al.  Bibliography on domination in graphs and some basic definitions of domination parameters , 1991, Discret. Math..

[11]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[12]  Gerhard Weikum,et al.  Distributed top-k aggregation queries at large , 2009, Distributed and Parallel Databases.

[13]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[14]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[15]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[17]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[18]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[19]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[20]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[21]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[22]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[23]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[24]  Gerhard Weikum,et al.  Searching RDF Graphs with SPARQL and Keywords , 2010, IEEE Data Eng. Bull..

[25]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[26]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[27]  Stavros Papadopoulos,et al.  Nearest keyword search in XML documents , 2011, SIGMOD '11.

[28]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[29]  Kevin Wilkinson,et al.  Jena Property Table Implementation , 2006 .

[30]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[31]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[32]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[33]  Günter Ladwig,et al.  Index structures and top-k join algorithms for native keyword search databases , 2011, CIKM '11.

[34]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[35]  Christian S. Jensen,et al.  Joint Top-K Spatial Keyword Query Processing , 2012, IEEE Transactions on Knowledge and Data Engineering.

[36]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[37]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[38]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[39]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[40]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.