Scalable skyline computation using a balanced pivot selection technique

Skyline queries have recently received considerable attention as an alternative decision-making operator in the database community. The conventional skyline algorithms have primarily focused on optimizing the dominance of points in order to remove non-skyline points as efficiently as possible, but have neglected to take into account the incomparability of points in order to bypass unnecessary comparisons. To design a scalable skyline algorithm, we first analyze a cost model that copes with both dominance and incomparability, and develop a novel technique to select a cost-optimal point, called a pivot point, that minimizes the number of comparisons in point-based space partitioning. We then implement the proposed pivot point selection technique in the existing sorting- and partitioning-based algorithms. For point insertions/deletions, we also discuss how to maintain the current skyline using a skytree, derived from recursive point-based space partitioning. Furthermore, we design an efficient greedy algorithm for the k representative skyline using the skytree. Experimental results demonstrate that the proposed algorithms are significantly faster than the state-of-the-art algorithms.

[1]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[2]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[3]  Raymond Chi-Wing Wong,et al.  Efficient skyline querying with variable user preferences on nominal attributes , 2008, Proc. VLDB Endow..

[4]  Seung-won Hwang,et al.  BSkyTree: scalable skyline computation using a balanced pivot selection , 2010, EDBT '10.

[5]  Yuan Tian,et al.  Z-SKY: an efficient skyline query processing framework based on Z-order , 2010, The VLDB Journal.

[6]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Jan Chomicki,et al.  Skyline with Presorting: Theory and Optimizations , 2005, Intelligent Information Systems.

[8]  Seung-won Hwang,et al.  SkyTree: scalable skyline computation for sensor data , 2009, SensorKDD '09.

[9]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[10]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[11]  Christos Doulkeridis,et al.  SKYPEER: Efficient Subspace Skyline Computation over Distributed Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Surajit Chaudhuri,et al.  Robust Cardinality and Cost Estimation for Skyline Operator , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[15]  Seung-won Hwang,et al.  Personalized top-k skyline queries in high-dimensional space , 2009, Inf. Syst..

[16]  Bin Liu,et al.  ZINC: Efficient Indexing for Skyline Computation , 2010, Proc. VLDB Endow..

[17]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[18]  Nikos Mamoulis,et al.  Efficient skyline evaluation over partially ordered domains , 2010, Proc. VLDB Endow..

[19]  Katja Hose,et al.  Processing relaxed skylines in PDMS using distributed data summaries , 2006, CIKM '06.

[20]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[21]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[22]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[23]  Nikos Mamoulis,et al.  Scalable skyline computation using object-based space partitioning , 2009, SIGMOD Conference.

[24]  Jignesh M. Patel,et al.  Efficient Skyline Computation over Low-Cardinality Domains , 2007, VLDB.

[25]  Jonghyun Park,et al.  Parallel Skyline Computation on Multicore Architectures , 2009, ICDE.

[26]  H. T. Kung,et al.  On the Average Number of Maxima in a Set of Vectors and Applications , 1978, JACM.

[27]  Christos Doulkeridis,et al.  Angle-based space partitioning for efficient parallel skyline computation , 2008, SIGMOD Conference.

[28]  Jing Yang,et al.  Efficient parallel skyline processing using hyperplane projections , 2011, SIGMOD '11.

[29]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[30]  Wolf-Tilo Balke,et al.  Efficient computation of trade-off skylines , 2010, EDBT '10.

[31]  Seung-won Hwang,et al.  Telescope: Zooming to Interesting Skylines , 2007, DASFAA.

[32]  Kenneth L. Clarkson,et al.  Fast linear expected-time algorithms for computing maxima and convex hulls , 1993, SODA '90.

[33]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[34]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[35]  Kian-Lee Tan,et al.  Stratified computation of skylines with partially-ordered domains , 2005, SIGMOD '05.

[36]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[37]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.