SPARK2: Top-k Keyword Query in Relational Databases

With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly. In this paper, we study the effectiveness and the efficiency issues of answering top-k keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. We also propose several efficient query processing methods for the new ranking method. We have conducted extensive experiments on large-scale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency.

[1]  Edward A. Fox,et al.  Research Contributions , 2014 .

[2]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[3]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[4]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[5]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[6]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[7]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[8]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[9]  Jianyong Wang,et al.  An effective and versatile keyword search engine on heterogenous data sources , 2008, Proc. VLDB Endow..

[10]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Yehoshua Sagiv,et al.  Efficient Engines for Keyword Proximity Search , 2005, WebDB.

[12]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[13]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[14]  Hans-Jörg Schek,et al.  PowerDB-IR: information retrieval on top of a database cluster , 2001, CIKM '01.

[15]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[16]  Luis Gravano,et al.  Efficient Keyword Search Across Heterogeneous Relational Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[18]  Surajit Chaudhuri,et al.  Extending autocompletion to tolerate errors , 2009, SIGMOD Conference.

[19]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  Yin Yang,et al.  Keyword search on relational data streams , 2007, SIGMOD '07.

[21]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Rahul Gupta,et al.  LIPTUS: associating structured and unstructured information in a banking environment , 2007, SIGMOD '07.

[23]  Anthony K. H. Tung,et al.  A graph method for keyword-based selection of the top-K databases , 2008, SIGMOD Conference.

[24]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[25]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[26]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[27]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[28]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[29]  Divyakant Agrawal,et al.  Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents , 2002, IEEE Trans. Knowl. Data Eng..

[30]  Nick Koudas,et al.  Measure-driven Keyword-Query Expansion , 2009, Proc. VLDB Endow..

[31]  Man Lung Yiu,et al.  Efficient Aggregation of Ranked Inputs , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32]  Anthony K. H. Tung,et al.  Effective keyword-based selection of relational databases , 2007, SIGMOD '07.

[33]  Jiawei Han,et al.  Towards robust indexing for ranked queries , 2006, VLDB.

[34]  Yin Yang,et al.  Reachability Indexes for Relational Keyword Search , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[35]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[36]  Mukesh K. Mohania,et al.  Towards automatic association of relevant unstructured content with structured query results , 2005, CIKM '05.

[37]  Jeffrey F. Naughton,et al.  Toward industrial-strength keyword search systems over relational data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[38]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[39]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[40]  R. Varshney,et al.  Supporting top-k join queries in relational databases , 2011 .

[41]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[42]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[43]  Yufei Tao,et al.  Finding frequent co-occurring terms in relational keyword search , 2009, EDBT '09.

[44]  Yi Chen,et al.  Structured Search Result Differentiation , 2009, Proc. VLDB Endow..

[45]  Ron Sacks-Davis,et al.  Similarity Measures for Short Queries , 1995, TREC.

[46]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[47]  Guido Moerkotte,et al.  Optimizing disjunctive queries with expensive predicates , 1994, SIGMOD '94.

[48]  Jun Zhang,et al.  NUITS: a novel user interface for efficient keyword search over databases , 2006, VLDB.

[49]  Jeffrey Xu Yu,et al.  Keyword Search in Databases , 2010, Keyword Search in Databases.

[50]  Jennifer Widom,et al.  Indexing relational database content offline for efficient keyword-based search , 2005, 9th International Database Engineering & Application Symposium (IDEAS'05).

[51]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[52]  Berthold Reinwald,et al.  Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[53]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[54]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[55]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[56]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[57]  Jeffrey F. Naughton,et al.  Combining keyword search and forms for ad hoc querying of databases , 2009, SIGMOD Conference.

[58]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[59]  Neoklis Polyzotis,et al.  Evaluating rank joins with optimal cost , 2008, PODS.

[60]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[61]  Georgia Koutrika,et al.  Précis: The Essence of a Query Answer , 2006, 22nd International Conference on Data Engineering (ICDE'06).