Asymptotically efficient algorithms for skyline probabilities of uncertain data

Skyline computation is widely used in multicriteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied. Some earlier work focused on probabilistic skylines with a given threshold; Atallah and Qi [2009] studied the problem to compute skyline probabilities for all instances of uncertain objects without the use of thresholds, and proposed an algorithm with subquadratic time complexity. In this work, we propose a new algorithm for computing all skyline probabilities that is asymptotically faster: worst-case <i>O(n</i> &sqrt;<i>n</i> log <i>n</i>) time and <i>O(n)</i> space for 2D data; <i>O</i>(<i>n</i><sup>2−1/d</sup> log<sup><i>d</i>−1</sup> <i>n</i>) time and <i>O(n</i> log<sup><i>d</i>−2</sup> <i>n</i>) space for <i>d</i>-dimensional data. Furthermore, we study the online version of the problem: Given any query point <i>p</i> (unknown until the query time), return the probability that no instance in the given data set dominates <i>p</i>. We propose an algorithm for answering such an online query for <i>d</i>-dimensional data in <i>O(n</i><sup>1−1/<i>d</i></sup> log<sup><i>d</i>−1</sup> <i>n</i>) time after preprocessing the data in <i>O(n</i><sup>2−1/d</sup> log<sup><i>d</i>−1</sup>) time and space.

[1]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[2]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[3]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[4]  Jeffrey Xu Yu,et al.  Probabilistic Skyline Operator over Sliding Windows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[8]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[9]  Susanne E. Hambrusch,et al.  Database Support for Probabilistic Attributes and Tuples , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Jian Pei,et al.  Computing Compressed Multidimensional Skyline Cubes Efficiently , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Dan E. Willard,et al.  New Data Structures for Orthogonal Range Queries , 1985, SIAM J. Comput..

[13]  Ömer Egecioglu,et al.  DeltaSky: Optimal Maintenance of Skyline Deletions without Exclusive Dominance Region Generation , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.

[15]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[16]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[17]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[18]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[19]  Shuigeng Zhou,et al.  Efficient Skyline Retrieval on Peer-to-Peer Networks , 2007, Future Generation Communication and Networking (FGCN 2007).

[20]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[21]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[22]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[23]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[25]  Christos Doulkeridis,et al.  SKYPEER: Efficient Subspace Skyline Computation over Distributed Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[27]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[28]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[29]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[30]  Bernhard Seeger,et al.  Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[31]  Xi Zhang,et al.  Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[32]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[33]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[34]  Jignesh M. Patel,et al.  Efficient Skyline Computation over Low-Cardinality Domains , 2007, VLDB.

[35]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.

[36]  Kevin Chen-Chuan Chang,et al.  URank: formulation and efficient evaluation of top-k queries in uncertain databases , 2007, SIGMOD '07.

[37]  Ambuj K. Singh,et al.  Top-k Spatial Joins of Probabilistic Objects , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[38]  J AtallahMikhail,et al.  Asymptotically efficient algorithms for skyline probabilities of uncertain data , 2011 .