Top-k join queries: overcoming the curse of anti-correlation

The existing heuristics for top-k join queries, aiming to minimize the scan-depth, rely heavily on scores and correlation of scores. It is known that for uniformly random scores between two relations of length n, scan-depth of √kn is required. Moreover, optimizing multiple criteria of selections that are anti-correlated may require scan-depth up to (n + k)/2. We build a linear space index, which in anticipation of worst-case queries maintains a subset of answers. Based on this, we achieve Õ(√kn) join trials i.e., average case performance even for the worst-case queries. The experimental evaluation shows superior performance against the well-known Rank-Join algorithm.

[1]  Wing-Kai Hon,et al.  Space-Efficient Framework for Top-k String Retrieval Problems , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[2]  Yufei Tao,et al.  Branch-and-bound processing of ranked queries , 2007, Inf. Syst..

[3]  Man Lung Yiu,et al.  Efficient Aggregation of Ranked Inputs , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Divesh Srivastava,et al.  Ranked join indices , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[5]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[6]  Neoklis Polyzotis,et al.  Robust and efficient algorithms for rank join evaluation , 2009, SIGMOD Conference.

[7]  Volker Heun,et al.  Practical Entropy-Bounded Schemes for O(1)-Range Minimum Queries , 2008, Data Compression Conference (dcc 2008).

[8]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[9]  Wing-Kai Hon,et al.  Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval , 2012, CPM.

[10]  Hicham G. Elmongui,et al.  Adaptive rank-aware query optimization in relational databases , 2006, TODS.

[11]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[12]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[13]  R. Varshney,et al.  Supporting top-k join queries in relational databases , 2011 .

[14]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[15]  Patrick Valduriez,et al.  JTop Algorithms for Top-k Join Queries , 2008 .