Rank-aware query processsing and optimization

Efficient execution of ranking query is increasingly becoming a major challenge for database technology. DBMSs provide efficient update, indexing, concurrency and recovery. On the other hand, IR on text and multimedia requires techniques involving uncertainty and ranking for effective retrieval. The main goal of this paper is to give an in-depth look on supporting ranking queries as an increasingly interesting area of research. We cover the state-of-the-art techniques in research prototypes and industry-strength database engines for efficient handling of ranking and queries. We focus primarily on how to integrate ranking as a new query processing and optimization dimension, with the aim of supporting ranking queries as a basic and core functionality. The paper identifies several challenges that need to be addressed towards a true support for ranking and effective retrieval in database management systems.

[1]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[2]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[3]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  Raghu Ramakrishnan,et al.  Probabilistic Optimization of Top N Queries , 1999, VLDB.

[5]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[7]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[8]  P. Diaconis Group representations in probability and statistics , 1988 .

[9]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[10]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[11]  Praveen Seshadri,et al.  PREDATOR: an OR-DBMS with enhanced data types , 1997, SIGMOD '97.

[12]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[13]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[14]  Hamid Pirahesh,et al.  Heterogeneous query processing through SQL table functions , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[15]  A. N. Wilschut,et al.  On the integration of IR and Databases , 1999 .

[16]  Hamid Pirahesh,et al.  SQL open heterogeneous data access , 1998, SIGMOD '98.

[17]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[18]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[19]  Guy M. Lman Grammar-like Functional Rules for Representing Query Optimization Alternatives , 1998 .

[20]  Divesh Srivastava,et al.  Ranked join indices , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[21]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[22]  Laura M. Haas,et al.  The Garlic project , 1996, SIGMOD '96.

[23]  Michael J. Carey,et al.  Reducing the Braking Distance of an SQL Query Engine , 1998, VLDB.

[24]  Luis Gravano,et al.  Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[25]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[26]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[27]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[28]  Patricia G. Selinger Information Integration and XML in IBM's DB2 , 2002, VLDB.

[29]  Jianping Fan,et al.  VDBMS: A testbed facility for research in video database benchmarking , 2004, Multimedia Systems.

[30]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[31]  David J. DeWitt,et al.  The EXODUS optimizer generator , 1987, SIGMOD '87.

[32]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[33]  Luis Gravano,et al.  Optimizing top-k selection queries over multimedia repositories , 2004, IEEE Transactions on Knowledge and Data Engineering.

[34]  R. Graham,et al.  Spearman's Footrule as a Measure of Disarray , 1977 .

[35]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[36]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[37]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[38]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, Distributed and Parallel Databases.

[39]  Raghu Ramakrishnan,et al.  The QUIQ engine: a hybrid IR-DB system , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[40]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.