On High Dimensional Skylines

In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multi-dimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness of data points based on how often they are returned in the skyline when different number of dimensions (i.e., subspaces) are considered. Intuitively, a point with a high skyline frequency is more interesting as it can be dominated on fewer combinations of the dimensions. Thus, the problem becomes one of finding top-k frequent skyline points. But the algorithms thus far proposed for skyline computation typically do not scale well with dimensionality. Moreover, frequent skyline computation requires that skylines be computed for each of an exponential number of subsets of the dimensions. We present efficient approximate algorithms to address these twin difficulties. Our extensive performance study shows that our approximate algorithm can run fast and compute the correct result on large data sets in high-dimensional spaces.

[1]  Ivan Stojmenovic,et al.  An optimal parallel algorithm for solving the maximal elements problem in the plane , 1988, Parallel Comput..

[2]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[3]  Kian-Lee Tan,et al.  Stratified computation of skylines with partially-ordered domains , 2005, SIGMOD '05.

[4]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[5]  Jirí Matousek,et al.  Computing Dominances in E^n , 1991, Inf. Process. Lett..

[6]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[8]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[9]  Richard M. Karp,et al.  Monte-Carlo Approximation Algorithms for Enumeration Problems , 1989, J. Algorithms.

[10]  Jan Chomicki,et al.  Hippo: A System for Computing Consistent Answers to a Class of SQL Queries , 2004, EDBT.

[11]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[12]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[13]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[14]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[15]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Mihalis Yannakakis,et al.  Multiobjective query optimization , 2001, PODS '01.

[17]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[18]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).