Range-Max Queries on Uncertain Data

Let P be a set of n uncertain points in Red, where each point pi ∈ P is associated with a real value vi and a probability αi ∈ (0,1] of existence, i.e., each pi exists with an independent probability αi. We present algorithms for building an index on P so that for a d-dimensional query rectangle ρ, the expected maximum value or the most-likely maximum value in ρ can be computed quickly. The specific contributions of our paper include the following: (i) The first index of sub-quadratic size to achieve a sub-linear query time in any dimension d ≥ 1. It also provides a trade-off between query time and size of the index. (ii) A conditional lower bound for the most-likely range-max queries, based on the conjectured hardness of the set-intersection problem, which suggests that in the worst case the product (query time)2 x (index size) is Ω((n2}/polylog(n)). (iii) A linear-size index for estimating the expected range-max value within approximation factor 1/2 in O(logc n) time, for some constant c > 0; that is, if the expected maximum value is μ then the query procedure returns a value μ' with μ/2 ≤ μ' ≤ μ. (iv) Extensions of our algorithm to more general uncertainty models and for computing the top-k values of the range-max.

[1]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[2]  Xiang Lian,et al.  Probabilistic top-k dominating queries in uncertain databases , 2013, Inf. Sci..

[3]  Pankaj K. Agarwal,et al.  Convex Hulls Under Uncertainty , 2016, Algorithmica.

[4]  Jeff M. Phillips,et al.  Range counting coresets for uncertain data , 2013, SoCG '13.

[5]  Yoshiharu Ishikawa,et al.  Processing Probabilistic Range Queries over Gaussian-Based Uncertain Data , 2013, SSTD.

[6]  Pankaj K. Agarwal,et al.  Range searching on uncertain data , 2012, TALG.

[7]  S. Matthew Weinberg,et al.  Matroid prophet inequalities , 2012, STOC '12.

[8]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[9]  Sudipto Guha,et al.  Model-driven optimization using adaptive probes , 2007, SODA '07.

[10]  Jian Li,et al.  Range queries on uncertain data , 2014, Theor. Comput. Sci..

[11]  Timothy M. Chan,et al.  Stochastic minimum spanning trees in euclidean spaces , 2011, SoCG '11.

[12]  Sariel Har-Peled,et al.  On the Complexity of Randomly Weighted Voronoi Diagrams , 2014, SoCG.

[13]  Michiel H. M. Smid,et al.  Two-Dimensional Range Diameter Queries , 2012, LATIN.

[14]  Chaitanya Swamy,et al.  Sampling-Based Approximation Algorithms for Multistage Stochastic Optimization , 2012, SIAM J. Comput..

[15]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[16]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Yufei Tao,et al.  Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  Mihai Patrascu,et al.  Distance Oracles beyond the Thorup-Zwick Bound , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[20]  Xu Zhou,et al.  Efficient top-(k,l) range query processing for uncertain data based on multicore architectures , 2015, Distributed and Parallel Databases.

[21]  E. Samuel-Cahn Comparison of Threshold Stop Rules and Maximum for Independent Nonnegative Random Variables , 1984 .

[22]  Hans-Peter Kriegel,et al.  A novel probabilistic pruning approach to speed up similarity queries in uncertain databases , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[23]  Saladi Rahul,et al.  Algorithms for range-skyline queries , 2012, SIGSPATIAL/GIS.

[24]  Maarten Löffler,et al.  Range Searching , 2016, Encyclopedia of Algorithms.

[25]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[26]  Yuval Rabani,et al.  Allocating bandwidth for bursty connections , 1997, STOC '97.

[27]  Yufei Tao,et al.  Range search on multidimensional uncertain data , 2007, TODS.

[28]  Pankaj K. Agarwal,et al.  Nearest-neighbor searching under uncertainty , 2012, PODS.

[29]  Hsien-Kuei Hwang,et al.  Maxima in hypercubes , 2005, Random Struct. Algorithms.

[30]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[31]  Jian Pei,et al.  Ranking queries on uncertain data , 2010, The VLDB Journal.

[32]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.

[33]  Graham Cormode,et al.  Approximation algorithms for clustering uncertain data , 2008, PODS.

[34]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[35]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[36]  R. Ravi,et al.  Online and stochastic survivable network design , 2009, STOC '09.

[37]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[38]  Susanne E. Hambrusch,et al.  Indexing Uncertain Categorical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[39]  Leonidas J. Guibas,et al.  Fractional cascading: I. A data structuring technique , 1986, Algorithmica.

[40]  Stanley B. Zdonik,et al.  Top-k queries on uncertain data: on score distribution and typical answers , 2009, SIGMOD Conference.

[41]  Jianwen Chen,et al.  Efficient pruning algorithm for top-K ranking on dataset with value uncertainty , 2013, CIKM.

[42]  Subhash Suri,et al.  On the Most Likely Voronoi Diagramand Nearest Neighbor Searching , 2014, ISAAC.

[43]  T. S. Jayram,et al.  Efficient aggregation algorithms for probabilistic data , 2007, SODA '07.