Boolean + ranking: querying a database by k-constrained optimization

The wide spread of databases for managing structured data, compounded with the expanded reach of the Internet, has brought forward interesting data retrieval and analysis scenarios to RDBMS. In such settings, queries often take the form of k-constrained optimization, with a Boolean constraint and a numeric optimization expression as the goal function, retrieving only the top-k tuples. This paper proposes the concept of supporting such queries, as their nature implies, by a functional optimization machinery over the search space of multiple indices. To realize this concept, we combine the dual perspectives of discrete state search (from the view of indices) and continuous function optimization (from the view of goal functions). We present, as the marriage of the two perspectives, the OPT* framework, which encodes k-constrained optimization as an A* search over the composite space of multiple indices, driven by functional optimization for providing tight heuristics. By processing queries as optimization, OPT* significantly outperforms baseline approaches, with up to 3 orders of magnitude margins.

[1]  Francesca Rossi,et al.  Constraint (Logic) Programming: A Survey on Research and Applications , 1999, New Trends in Constraints.

[2]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[3]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[4]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[5]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[6]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[7]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[8]  John R. Smith,et al.  SPIRE: a progressive content-based spatial image retrieval engine , 2000, SIGMOD '00.

[9]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[10]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[11]  Marco Patella,et al.  The M2-tree: Processing Complex Multi-Feature Queries with Just One Index , 2000, DELOS.

[12]  Hanan Samet,et al.  Incremental distance join algorithms for spatial databases , 1998, SIGMOD '98.

[13]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[14]  S. Nepal,et al.  A heuristic for combining fuzzy results in multimedia databases , 2002 .

[15]  Jeffrey Scott Vitter,et al.  Aggregate Predicate Support in DBMS , 2002, Australasian Database Conference.

[16]  Chad Carson,et al.  Optimizing queries over multimedia repositories , 1996, SIGMOD '96.

[17]  Michael J. Maher,et al.  Constraint Logic Programming: A Survey , 1994, J. Log. Program..

[18]  Sukho Lee,et al.  Adaptive and Incremental Processing for Distance Join Queries , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[20]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[21]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .