Finding and approximating top-k answers in keyword proximity search

Various approaches for keyword proximity search have been implemented in relational databases, XML and the Web. Yet, in all of them, an answer is a Q-fragment, namely, a subtree T of the given data graph G, such that T contains all the keywords of the query Q and has no proper subtree with this property. The rank of an answer is inversely proportional to its weight. Three problems are of interest: finding an optimal (i.e., top-ranked) answer, computing the top-k answers and enumerating all the answers in ranked order. It is shown that, under data complexity, an efficient algorithm for solving the first problem is sufficient for solving the other two problems with polynomial delay. Similarly, an efficient algorithm for finding a θ-approximation of the optimal answer suffices for carrying out the following two tasks with polynomial delay, under query-and-data complexity. First, enumerating in a (θ+1)-approximate order. Second, computing a (θ+1)-approximation of the top-k answers. As a corollary, this paper gives the first efficient algorithms, under data complexity, for enumerating all the answers in ranked order and for computing the top-k answers. It also gives the first efficient algorithms, under query-and-data complexity, for enumerating in a provably approximate order and for computing an approximation of the top-k answers.

[1]  E. Lawler A PROCEDURE FOR COMPUTING THE K BEST SOLUTIONS TO DISCRETE OPTIMIZATION PROBLEMS AND ITS APPLICATION TO THE SHORTEST PATH PROBLEM , 1972 .

[2]  Alex Zelikovsky,et al.  Improved Steiner tree approximation in graphs , 2000, SODA '00.

[3]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[4]  R. Ravi,et al.  A polylogarithmic approximation algorithm for the group Steiner tree problem , 2000, SODA '98.

[5]  Alex Zelikovsky,et al.  An 11/6-approximation algorithm for the network steiner problem , 1993, Algorithmica.

[6]  Jon Feldman,et al.  The Directed Steiner Network problem is tractable for a constant number of terminals , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[7]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[8]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[9]  Mihalis Yannakakis,et al.  On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[10]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Divyakant Agrawal,et al.  Retrieving and organizing web pages by “information unit” , 2001, WWW '01.

[12]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[13]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[14]  Alex Zelikovsky,et al.  An improved approximation scheme for the Group Steiner Problem , 2001, Networks.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Yehoshua Sagiv,et al.  New algorithms for computing Steiner trees for a fixed number of terminals , 2006 .

[17]  Sudipto Guha,et al.  Approximation algorithms for directed Steiner problems , 1999, SODA '98.

[18]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[19]  Yehoshua Sagiv,et al.  Efficient Engines for Keyword Proximity Search , 2005, WebDB.

[20]  Yehoshua Sagiv,et al.  Efficiently Enumerating Results of Keyword Search , 2005, DBPL.

[21]  S. E. Dreyfus,et al.  The steiner problem in graphs , 1971, Networks.

[22]  Surajit Chaudhuri,et al.  DBXplorer: enabling keyword search over relational databases , 2002, SIGMOD '02.

[23]  George Markowsky,et al.  A fast algorithm for Steiner trees , 1981, Acta Informatica.