Probabilistic Inverse Ranking Queries over Uncertain Data

Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data uncertainty and answer queries with confidence guarantees. In this paper, we formulate and tackle an important query, namely probabilistic inverse ranking (PIR) query, which retrieves possible ranks of a given query object in an uncertain database with confidence above a probability threshold. We present effective pruning methods to reduce the PIR search space, which can be seamlessly integrated into an efficient query procedure. Furthermore, we also tackle the problem of PIR query processing in high dimensional spaces, which reduces high dimensional uncertain data to a lower dimensional space. The proposed reduction technique may shed light on processing high dimensional uncertain data for other query types. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches over both real and synthetic data sets, under various experimental settings.

[1]  Kevin Chen-Chuan Chang,et al.  Enabling data retrieval: by ranking and beyond , 2007 .

[2]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[3]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[4]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[5]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[6]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[7]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[9]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Susanne E. Hambrusch,et al.  Database Support for Probabilistic Attributes and Tuples , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Philip S. Yu,et al.  Time Series Compressibility and Privacy , 2007, VLDB.

[12]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[13]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[14]  Christian Böhm,et al.  The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[16]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[17]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[18]  Yuan-Chi Chang,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD 2000.

[19]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[20]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[21]  Walid G. Aref,et al.  Casper*: Query processing for location services without compromising privacy , 2006, TODS.

[22]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[23]  Philippe Bonnet,et al.  GADT: a probability space ADT for representing and querying the physical world , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Hongjun Lu,et al.  Approximate processing of massive continuous quantile queries over high-speed data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[26]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[27]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[28]  Sunil Prabhakar,et al.  U-DBMS: A Database System for Managing Constantly-Evolving Data , 2005, VLDB.

[29]  Yufei Tao,et al.  Branch-and-bound processing of ranked queries , 2007, Inf. Syst..

[30]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.