Range queries on uncertain data

Given a set $P$ of $n$ uncertain points on the real line, each represented by its one-dimensional probability density function, we consider the problem of building data structures on $P$ to answer range queries of the following three types for any query interval $I$: (1) top-$1$ query: find the point in $P$ that lies in $I$ with the highest probability, (2) top-$k$ query: given any integer $k\leq n$ as part of the query, return the $k$ points in $P$ that lie in $I$ with the highest probabilities, and (3) threshold query: given any threshold $\tau$ as part of the query, return all points of $P$ that lie in $I$ with probabilities at least $\tau$. We present data structures for these range queries with linear or nearly linear space and efficient query time.

[1]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[2]  Joseph S. B. Mitchell,et al.  L1 shortest paths among polygonal obstacles in the plane , 1992, Algorithmica.

[3]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[4]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[5]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[6]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[7]  Jian Li,et al.  Range Queries on Uncertain Data , 2014, ISAAC.

[8]  David G. Kirkpatrick,et al.  Optimal Search in Planar Subdivisions , 1983, SIAM J. Comput..

[9]  Christopher Ré,et al.  Probabilistic databases , 2011, SIGA.

[10]  Graham Cormode,et al.  Approximation algorithms for clustering uncertain data , 2008, PODS.

[11]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[12]  Sunil Prabhakar,et al.  Threshold query optimization for uncertain data , 2010, SIGMOD Conference.

[13]  Bernard Chazelle Filtering Search: A New Approach to Query-Answering , 1983, FOCS.

[14]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[15]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[16]  Sudipto Guha,et al.  Exceeding expectations and clustering uncertain data , 2009, PODS.

[17]  Donald B. Johnson,et al.  The Complexity of Selection and Ranking in X+Y and Matrices with Sorted Columns , 1982, J. Comput. Syst. Sci..

[18]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[19]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[20]  Timothy M. Chan,et al.  Optimal halfspace range reporting in three dimensions , 2009, SODA.

[21]  Ambuj K. Singh,et al.  APLA: Indexing Arbitrary Probability Distributions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[23]  Leonidas J. Guibas,et al.  Optimal Point Location in a Monotone Subdivision , 1986, SIAM J. Comput..

[24]  Bernard Chazelle,et al.  The power of geometric duality , 1985, BIT Comput. Sci. Sect..

[25]  Susanne E. Hambrusch,et al.  Indexing Uncertain Categorical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Micha Sharir,et al.  Red-Blue intersection detection algorithms, with applications to motion planning and collision detection , 1990, SCG '88.

[27]  Andrew McGregor,et al.  Estimating statistical aggregates on probabilistic data streams , 2007, PODS.

[28]  Christopher Ré,et al.  Efficient Evaluation of , 2007, DBPL.

[29]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[30]  Yufei Tao,et al.  Range search on multidimensional uncertain data , 2007, TODS.

[31]  Jian Li,et al.  Ranking continuous probabilistic datasets , 2010, Proc. VLDB Endow..

[32]  Lise Getoor,et al.  PrDB: managing and exploiting rich correlations in probabilistic databases , 2009, The VLDB Journal.

[33]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[34]  Leonidas J. Guibas,et al.  Fractional cascading: II. Applications , 1986, Algorithmica.

[35]  Christoph E. Koch MayBMS: A System for Managing Large Uncertain and Probabilistic Databases , 2009 .

[36]  Qi Yu,et al.  Efficient range query processing on uncertain data , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[37]  Leonidas J. Guibas,et al.  Fractional cascading: I. A data structuring technique , 1986, Algorithmica.

[38]  Robert E. Tarjan,et al.  Making Data Structures Persistent , 1989, J. Comput. Syst. Sci..

[39]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[40]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[41]  Yufei Tao,et al.  Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[42]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[43]  Yufei Tao,et al.  Indexing uncertain data , 2009, PODS.

[44]  Charu C. Aggarwal,et al.  MayBMS A System for Managing Large Probabilistic Databases , 2009 .

[45]  Pankaj K. Agarwal,et al.  Nearest-neighbor searching under uncertainty , 2012, PODS.

[46]  Jeff M. Phillips,et al.  Range counting coresets for uncertain data , 2013, SoCG '13.