A graph method for keyword-based selection of the top-K databases

While database management systems offer a comprehensive solution to data storage, they require deep knowledge of the schema, as well as the data manipulation language, in order to perform effective retrieval. Since these requirements pose a problem to lay or occasional users, several methods incorporate keyword search (KS) into relational databases. However, most of the existing techniques focus on querying a single DBMS. On the other hand, the proliferation of distributed databases in several conventional and emerging applications necessitates the support for keyword-based data sharing and querying over multiple DMBSs. In order to avoid the high cost of searching in numerous, potentially irrelevant, databases in such systems, we propose G-KS, a novel method for selecting the top-K candidates based on their potential to contain results for a given query. G-KSsummarizes each database by a keyword relationship graph, where nodes represent terms and edges describe relationships between them. Keyword relationship graphs are utilized for computing the similarity between each database and a KS query, so that, during query processing, only the most promising databases are searched. An extensive experimental evaluation demonstrates that G-KS outperforms the current state-of-the-art technique on all aspects, including precision, recall, efficiency, space overhead and flexibility of accommodating different semantics.

[1]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[2]  Anthony K. H. Tung,et al.  Effective keyword-based selection of relational databases , 2007, SIGMOD '07.

[3]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[4]  Yin Yang,et al.  Keyword search on relational data streams , 2007, SIGMOD '07.

[5]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[6]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[7]  C. Fellbaum An Electronic Lexical Database , 1998 .

[8]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[9]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[10]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[11]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[12]  Surajit Chaudhuri,et al.  DBXplorer: enabling keyword search over relational databases , 2002, SIGMOD '02.

[13]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[14]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[15]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[17]  Vijay V. Raghavan,et al.  On modeling of information retrieval concepts in vector spaces , 1987, TODS.

[18]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[19]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[20]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[21]  Luis Gravano,et al.  Efficient Keyword Search Across Heterogeneous Relational Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[24]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[25]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[26]  James Allan,et al.  Capturing term dependencies using a language model based on sentence trees , 2002, CIKM '02.