论文信息 - SPARK2: Top-k Keyword Query in Relational Databases

SPARK2: Top-k Keyword Query in Relational Databases

With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly. In this paper, we study the effectiveness and the efficiency issues of answering top-k keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. We also propose several efficient query processing methods for the new ranking method. We have conducted extensive experiments on large-scale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency.

[1] Edward A. Fox,et al. Research Contributions , 2014 .

[2] Xuemin Lin,et al. SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[3] Beng Chin Ooi,et al. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[4] Moni Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS '01.

[5] Sandeep Tata,et al. SQAK: doing more with keywords , 2008, SIGMOD Conference.

[6] S. Sudarshan,et al. Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[7] Luis Gravano,et al. Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[8] Jeffrey Xu Yu,et al. Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[9] Jianyong Wang,et al. An effective and versatile keyword search engine on heterogenous data sources , 2008, Proc. VLDB Endow..

[10] Donald Kossmann,et al. The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[11] Yehoshua Sagiv,et al. Efficient Engines for Keyword Proximity Search , 2005, WebDB.

[12] Timos K. Sellis,et al. Multiple-query optimization , 1988, TODS.

[13] K. Pu,et al. Keyword query cleaning , 2008, Proc. VLDB Endow..

[14] Hans-Jörg Schek,et al. PowerDB-IR: information retrieval on top of a database cluster , 2001, CIKM '01.

[15] Michael Stonebraker,et al. Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[16] Luis Gravano,et al. Efficient Keyword Search Across Heterogeneous Relational Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17] Ronald Fagin,et al. Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[18] Surajit Chaudhuri,et al. Extending autocompletion to tolerate errors , 2009, SIGMOD Conference.

[19] Luis Gravano,et al. Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[20] Yin Yang,et al. Keyword search on relational data streams , 2007, SIGMOD '07.

[21] S. Sudarshan,et al. Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[22] Rahul Gupta,et al. LIPTUS: associating structured and unstructured information in a banking environment , 2007, SIGMOD '07.

[23] Anthony K. H. Tung,et al. A graph method for keyword-based selection of the top-K databases , 2008, SIGMOD Conference.

[24] Yehoshua Sagiv,et al. Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[25] Gerhard Weikum,et al. IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[26] John R. Smith,et al. Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[27] Clement T. Yu,et al. Effective keyword search in relational databases , 2006, SIGMOD Conference.

[28] Vagelis Hristidis,et al. DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[29] Divyakant Agrawal,et al. Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents , 2002, IEEE Trans. Knowl. Data Eng..

[30] Nick Koudas,et al. Measure-driven Keyword-Query Expansion , 2009, Proc. VLDB Endow..

[31] Man Lung Yiu,et al. Efficient Aggregation of Ranked Inputs , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32] Anthony K. H. Tung,et al. Effective keyword-based selection of relational databases , 2007, SIGMOD '07.

[33] Jiawei Han,et al. Towards robust indexing for ranked queries , 2006, VLDB.

[34] Yin Yang,et al. Reachability Indexes for Relational Keyword Search , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[35] Seung-won Hwang,et al. Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[36] Mukesh K. Mohania,et al. Towards automatic association of relevant unstructured content with structured query results , 2005, CIKM '05.

[37] Jeffrey F. Naughton,et al. Toward industrial-strength keyword search systems over relational data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[38] Roy Goldman,et al. Proximity Search in Databases , 1998, VLDB.

[39] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[40] R. Varshney,et al. Supporting top-k join queries in relational databases , 2011 .

[41] Gerhard Weikum,et al. Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[42] Peter J. Haas,et al. Ripple joins for online aggregation , 1999, SIGMOD '99.

[43] Yufei Tao,et al. Finding frequent co-occurring terms in relational keyword search , 2009, EDBT '09.

[44] Yi Chen,et al. Structured Search Result Differentiation , 2009, Proc. VLDB Endow..

[45] Ron Sacks-Davis,et al. Similarity Measures for Short Queries , 1995, TREC.

[46] Surajit Chaudhuri,et al. DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[47] Guido Moerkotte,et al. Optimizing disjunctive queries with expensive predicates , 1994, SIGMOD '94.

[48] Jun Zhang,et al. NUITS: a novel user interface for efficient keyword search over databases , 2006, VLDB.

[49] Jeffrey Xu Yu,et al. Keyword Search in Databases , 2010, Keyword Search in Databases.

[50] Jennifer Widom,et al. Indexing relational database content offline for efficient keyword-based search , 2005, 9th International Database Engineering & Application Symposium (IDEAS'05).

[51] Yehoshua Sagiv,et al. Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[52] Berthold Reinwald,et al. Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[53] S. Sudarshan,et al. Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[54] Dimitrios Gunopulos,et al. Answering top-k queries using views , 2006, VLDB.

[55] Shan Wang,et al. Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[56] Feng Shao,et al. XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[57] Jeffrey F. Naughton,et al. Combining keyword search and forms for ad hoc querying of databases , 2009, SIGMOD Conference.

[58] Yun Chi,et al. HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[59] Neoklis Polyzotis,et al. Evaluating rank joins with optimal cost , 2008, PODS.

[60] Philip S. Yu,et al. BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[61] Georgia Koutrika,et al. Précis: The Essence of a Query Answer , 2006, 22nd International Conference on Data Engineering (ICDE'06).