Finding k-dominant skylines in high dimensional space

Given a d-dimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are useful in many decision making applications.Unfortunately, as the number of dimensions increases, the chance of one point dominating another point is very low. As such, the number of skyline points become too numerous to offer any interesting insights. To find more important and meaningful skyline points in high dimensional space, we propose a new concept, called k-dominant skyline which relaxes the idea of dominance to k-dominance. A point p is said to k-dominate another point q if there are k ≤ d dimensions in which p is better than or equal to q and is better in at least one of these k dimensions. A point that is not k-dominated by any other points is in the k-dominant skyline.We prove various properties of k-dominant skyline. In particular, because k-dominant skyline points are not transitive, existing skyline algorithms cannot be adapted for k-dominant skyline. We then present several new algorithms for finding k-dominant skyline and its variants. Extensive experiments show that our methods can answer different queries on both synthetic and real data sets efficiently.

[1]  D. H. McLain,et al.  Drawing Contours from Arbitrary Data Points , 1974, Comput. J..

[2]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[3]  Anthony K. H. Tung,et al.  DADA: a data cube for dominant relationship analysis , 2006, SIGMOD Conference.

[4]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[5]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[6]  Werner Kießling,et al.  Foundations of Preferences in Database Systems , 2002, VLDB.

[7]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[8]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[9]  Rakesh Agrawal,et al.  A framework for expressing and combining preferences , 2000, SIGMOD '00.

[10]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[11]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[12]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[13]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[14]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[15]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[16]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.