Finding Probabilistic k-Skyline Sets on Uncertain Data

Skyline is a set of points that are not dominated by any other point. Given uncertain objects, probabilistic skyline has been studied which computes objects with high probability of being skyline. While useful for selecting individual objects, it is not sufficient for scenarios where we wish to compute a subset of skyline objects, i.e., a skyline set. In this paper, we generalize the notion of probabilistic skyline to probabilistic k-skyline sets (Pk-SkylineSets) which computes k-object sets with high probability of being skyline set. We present an efficient algorithm for computing probabilistic k-skyline sets. It uses two heuristic pruning strategies and a novel data structure based on the classic layered range tree to compute the skyline set probability for each instance set with a worst-case time bound. The experimental results on the real NBA dataset and the synthetic datasets show that Pk-SkylineSets is interesting and useful, and our algorithms are efficient and scalable.

[1]  Muhammad Aamir Cheema,et al.  Stochastic skyline operator , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2]  Ilaria Bartolini,et al.  The Skyline of a Probabilistic Relation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Mao Ye,et al.  U-Skyline: A New Skyline Query for Uncertain Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  Xuemin Lin,et al.  Skyline probability over uncertain preferences , 2013, EDBT '13.

[5]  H. T. Kung,et al.  On the Average Number of Maxima in a Set of Vectors and Applications , 1978, JACM.

[6]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7]  David G. Kirkpatrick,et al.  Output-size sensitive algorithms for finding maximal vectors , 1985, SCG '85.

[8]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[9]  Dimitris Sacharidis,et al.  Probabilistic contextual skylines , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[10]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[11]  Mikhail J. Atallah,et al.  Asymptotically efficient algorithms for skyline probabilities of uncertain data , 2011, TODS.

[12]  Leonidas J. Guibas,et al.  Fractional cascading: II. Applications , 1986, Algorithmica.

[13]  Vissarion Fisikopoulos An implementation of range trees with fractional cascading in C++ , 2011, ArXiv.

[14]  Jian Pei,et al.  Finding Pareto Optimal Groups: Group-based Skyline , 2015, Proc. VLDB Endow..

[15]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.

[16]  Xiaofeng Xu,et al.  Faster output-sensitive skyline computation algorithm , 2014, Inf. Process. Lett..

[17]  M. V. Kreveld Computational Geometry , 2000, Springer Berlin Heidelberg.

[18]  Ilaria Bartolini,et al.  Domination in the Probabilistic World , 2014, ACM Trans. Database Syst..

[19]  Bin Jiang,et al.  Ranking uncertain sky: The probabilistic top-k skyline operator , 2011, Inf. Syst..

[20]  Muhammad Aamir Cheema,et al.  Stochastic skylines , 2012, TODS.

[21]  Mohamed F. Mokbel,et al.  Skyline query processing for uncertain data , 2010, CIKM.

[22]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[23]  Leonidas J. Guibas,et al.  Fractional cascading: I. A data structuring technique , 1986, Algorithmica.

[24]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[25]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[26]  Jeffrey Xu Yu,et al.  Probabilistic Skyline Operator over Sliding Windows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.