Optimal Incremental Algorithms for Top-k Joins with User-Defined Join Constraints

We investigate the problem of incremental joins of multiple ranked data streams when the join condition is a list of arbitrary user-defined predicates on the input tuples. We propose an algorithm J∗ for ranked input joins over user-defined join predicates. The basic version of the algorithm uses only sequential access into the database and is easily pipelinable—that is, the output of one join query can be fed as the input of another. We also propose a J∗ PA algorithm that can exploit available database indexes for efficient random access based on the join predicates, as well as give 2-approximation versions for both of the above algorithms. Finally, we prove strong optimality results for J∗ and its approximated version, and we study their performance empirically.

[1]  Hicham G. Elmongui,et al.  Adaptive rank-aware query optimization in relational databases , 2006, TODS.

[2]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[3]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[4]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[5]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[6]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[7]  Luis Gravano,et al.  Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[8]  Ronald Fagin,et al.  Combining fuzzy information: an overview , 2002, SGMD.

[9]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[10]  Jeffrey Scott Vitter,et al.  Aggregate Predicate Support in DBMS , 2002, Australasian Database Conference.

[11]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[12]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[13]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[14]  M. Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[15]  John R. Smith,et al.  Constrained querying of multimedia databases: issues and approaches , 2001, IS&T/SPIE Electronic Imaging.

[16]  Werner Kießling,et al.  Optimizing Multi-Feature Queries for Image Databases , 2000, VLDB.

[17]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[18]  Raghu Ramakrishnan,et al.  Probabilistic Optimization of Top N Queries , 1999, VLDB.

[19]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[20]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[21]  Michael J. Carey,et al.  Reducing the Braking Distance of an SQL Query Engine , 1998, VLDB.

[22]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[23]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[24]  Chad Carson,et al.  Optimizing queries over multimedia repositories , 1996, SIGMOD '96.

[25]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .