Ranking uncertain sky: The probabilistic top-k skyline operator

Many recent applications involve processing and analyzing uncertain data. In this paper, we combine the feature of top-k objects with that of skyline to model the problem of top-k skyline objects against uncertain data. The problem of efficiently computing top-k skyline objects on large uncertain datasets is challenging in both discrete and continuous cases. In this paper, firstly an efficient exact algorithm for computing the top-k skyline objects is developed for discrete cases. To address applications where each object may have a massive set of instances or a continuous probability density function, we also develop an efficient randomized algorithm with an @e@?approximation guarantee. Moreover, our algorithms can be immediately extended to efficiently compute p-skyline; that is, retrieving the uncertain objects with skyline probabilities above a given threshold. Our extensive experiments on synthetic and real data demonstrate the efficiency of both algorithms and the randomized algorithm is highly accurate. They also show that our techniques significantly outperform the existing techniques for computing p-skyline.

[1]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[2]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[3]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[4]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[5]  Cyrus Shahabi,et al.  The spatial skyline queries , 2006, VLDB.

[6]  Yufei Tao,et al.  Maintaining sliding window skylines on data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Julius T. Tou,et al.  Information Systems , 1973, GI Jahrestagung.

[8]  Christos Doulkeridis,et al.  SKYPEER: Efficient Subspace Skyline Computation over Distributed Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[10]  Elke A. Rundensteiner,et al.  Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations , 1997, VLDB.

[11]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Beng Chin Ooi,et al.  Skyline Queries Against Mobile Lightweight Devices in MANETs , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[14]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[15]  Man Lung Yiu,et al.  Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data , 2007, VLDB.

[16]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[17]  Jian Pei,et al.  Computing Compressed Multidimensional Skyline Cubes Efficiently , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[19]  Nick Koudas,et al.  Finding Skyline and Top-k Bargaining Solutions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Seung-won Hwang,et al.  Spatial Skyline Queries: An Efficient Geometric Algorithm , 2009, SSTD.

[21]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[22]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[23]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[24]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[25]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[26]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[27]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[28]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[29]  Divesh Srivastava,et al.  Summarizing Two-Dimensional Data with Skyline-Based Statistical Descriptors , 2008, SSDBM.

[30]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[32]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[33]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[34]  Tian Xia,et al.  Refreshing the sky: the compressed skycube with efficient support for frequent updates , 2006, SIGMOD Conference.

[35]  Anthony K. H. Tung,et al.  Categorical skylines for streaming data , 2008, SIGMOD Conference.

[36]  Jian Pei,et al.  SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[37]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[39]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.