Skyline Operator on Anti-correlated Distributions

Finding the skyline in a multi-dimensional space is relevant to a wide range of applications. The skyline operator over a set of d-dimensional points selects the points that are not dominated by any other point on all dimensions. Therefore, it provides a minimal set of candidates for the users to make their personal trade-off among all optimal solutions. The existing algorithms establish both the worst case complexity by discarding distributions and the average case complexity by assuming dimensional independence. However, the data in the real world is more likely to be anti-correlated. The cardinality and complexity analysis on dimensionally independent data is meaningless when dealing with anti-correlated data. Furthermore, the performance of the existing algorithms becomes impractical on anti-correlated data. In this paper, we establish a cardinality model for anti-correlated distributions. We propose an accurate polynomial estimation for the expected value of the skyline cardinality. Because the high skyline cardinality downgrades the performance of most existing algorithms on anti-correlated data, we further develop a determination and elimination framework which extends the well-adopted elimination strategy. It achieves remarkable effectiveness and efficiency. The comprehensive experiments on both real datasets and benchmark synthetic datasets demonstrate that our approach significantly outperforms the state-of-the-art algorithms under a wide range of settings.

[1]  Michael Ian Shamos,et al.  Divide and Conquer for Linear Expected Time , 1978, Inf. Process. Lett..

[2]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[3]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[4]  Seung-won Hwang,et al.  BSkyTree: scalable skyline computation using a balanced pivot selection , 2010, EDBT '10.

[5]  Jing Yang,et al.  Computing Large Skylines over Few Dimensions: The Curse of Anti-correlation , 2010, 2010 12th International Asia-Pacific Web Conference.

[6]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[7]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[8]  Yufei Tao,et al.  On finding skylines in external memory , 2011, PODS.

[9]  Parke Godfrey,et al.  Skyline Cardinality for Relational Processing , 2004, FoIKS.

[10]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[11]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Ashwin Lall,et al.  Randomized Multi-pass Streaming Skyline Algorithms , 2009, Proc. VLDB Endow..

[13]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[14]  Kenneth L. Clarkson,et al.  Fast linear expected-time algorithms for computing maxima and convex hulls , 1993, SODA '90.

[15]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[17]  Yin Yang,et al.  Kernel-based skyline cardinality estimation , 2009, SIGMOD Conference.

[18]  Chris L. Jackins,et al.  Oct-trees and their use in representing three-dimensional objects , 1980 .

[19]  H. T. Kung,et al.  On the Average Number of Maxima in a Set of Vectors and Applications , 1978, JACM.

[20]  Surajit Chaudhuri,et al.  Robust Cardinality and Cost Estimation for Skyline Operator , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[22]  Nikos Mamoulis,et al.  Scalable skyline computation using object-based space partitioning , 2009, SIGMOD Conference.