An optimal and progressive algorithm for skyline queries

The skyline of a set of d-dimensional points contains the points that are not dominated by any other point on all dimensions. Skyline computation has recently received considerable attention in the database community, especially for progressive (or online) algorithms that can quickly return the first skyline points without having to read the entire data file. Currently, the most efficient algorithm is NN (<u>n</u>earest <u>n</u>eighbors), which applies the divide -and-conquer framework on datasets indexed by R-trees. Although NN has some desirable features (such as high speed for returning the initial skyline points, applicability to arbitrary data distributions and dimensions), it also presents several inherent disadvantages (need for duplicate elimination if d>2, multiple accesses of the same node, large space overhead). In this paper we develop BBS (<u>b</u>ranch-and-<u>b</u>ound <u>s</u>kyline), a progressive algorithm also based on nearest neighbor search, which is IO optimal, i.e., it performs a single access only to those R-tree nodes that may contain skyline points. Furthermore, it does not retrieve duplicates and its space overhead is significantly smaller than that of NN. Finally, BBS is simple to implement and can be efficiently applied to a variety of alternative skyline queries. An analytical and experimental comparison shows that BBS outperforms NN (usually by orders of magnitude) under all problem instances.

[1]  Christian Böhm,et al.  Determining the Convex Hull in Large Multidimensional Databases , 2001, DaWaK.

[2]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[3]  Jirí Matousek,et al.  Computing Dominances in E^n , 1991, Inf. Process. Lett..

[4]  Christian Buchta,et al.  On the Average Number of Maxima in a Set of Vectors , 1989, Inf. Process. Lett..

[5]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[6]  Divyakant Agrawal,et al.  Constrained Nearest Neighbor Queries , 2001, Encyclopedia of GIS.

[7]  Ivan Stojmenovic,et al.  An optimal parallel algorithm for solving the maximal elements problem in the plane , 1988, Parallel Comput..

[8]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[9]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[10]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  D. H. McLain,et al.  Drawing Contours from Arbitrary Data Points , 1974, Comput. J..

[12]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[13]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[14]  S. Deming Multiple-criteria optimization , 1991 .

[15]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[16]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[17]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[18]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[19]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[20]  Timos K. Sellis,et al.  Efficient Cost Models for Spatial Queries Using R-Trees , 2000, IEEE Trans. Knowl. Data Eng..