In the rank join problem, we are given a set of relations and a scoring function, and the goal is to return the join results with the top K scores. It is often the case in practice that the inputs may be accessed in ranked order and the scoring function is monotonic. These conditions allow for efficient algorithms that solve the rank join problem without reading all of the input. In this paper, we present a thorough analysis of such rank join algorithms. A strong point of our analysis is that it is based on a more general problem statement than previous work, making it more relevant to the execution model that is employed by database systems. One of our results indicates that the well known HRJN algorithm has shortcomings, because it does not stop reading its input as soon as possible. We find that it is NP-hard to overcome this weakness in the general case, but cases of limited query complexity are tractable. We prove the latter with an algorithm that infers provably tight bounds on the potential benefit of reading more input in order to stop as soon as possible. As a result, the algorithm achieves a cost that is within a constant factor of optimal.
[1]
John R. Smith,et al.
Supporting Incremental Join Queries on Ranked Inputs
,
2001,
VLDB.
[2]
Neoklis Polyzotis,et al.
Depth estimation for ranking query optimization
,
2008,
The VLDB Journal.
[3]
Moni Naor,et al.
Optimal aggregation algorithms for middleware
,
2001,
PODS.
[4]
Man Lung Yiu,et al.
Efficient top-k aggregation of ranked inputs
,
2007,
TODS.
[5]
Hicham G. Elmongui,et al.
Adaptive rank-aware query optimization in relational databases
,
2006,
TODS.
[6]
Ronald Fagin,et al.
Combining Fuzzy Information from Multiple Systems
,
1999,
J. Comput. Syst. Sci..
[7]
Kevin Chen-Chuan Chang,et al.
RankSQL: query algebra and optimization for relational top-k queries
,
2005,
SIGMOD '05.