Finding the least influenced set in uncertain databases

Due to the inherent existence of uncertainty in many real-world applications, in this paper, we investigate an important query in uncertain databases, namely probabilistic least influenced set (PLIS) query, which retrieves all the uncertain objects in an uncertain database such that they are the least affected by a given query object with high probabilities. Such a PLIS query is useful in applications such as business planning. We propose and tackle both monochromatic and bichromatic versions (i.e. M-PLIS and B-PLIS, respectively) of the PLIS query. In order to efficiently answer PLIS queries, we present three pruning methods, MINMAX, Regional, and Candidate pruning, which can effectively reduce the PLIS search space. The proposed pruning methods can be seamlessly integrated into efficient query procedures. Moreover, we also study important variants of PLIS query with uncertain query object (i.e. UQ-PLIS). Furthermore, we formulate and tackle the PLIS problem on uncertain moving objects (i.e. UMOD-PLIS). Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches under various settings.

[1]  Walid G. Aref,et al.  Casper*: Query processing for location services without compromising privacy , 2006, TODS.

[2]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[3]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[4]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[5]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[6]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[7]  Feifei Li,et al.  Reverse Furthest Neighbors in Spatial Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Philippe Bonnet,et al.  GADT: a probability space ADT for representing and querying the physical world , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[10]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[11]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[12]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[13]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[14]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[15]  Yannis Manolopoulos,et al.  Fast Nearest-Neighbor Query Processing in Moving-Object Databases , 2003, GeoInformatica.

[16]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[17]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[19]  Donald F. Towsley,et al.  Relays, base stations, and meshes: enhancing mobile networks with infrastructure , 2008, MobiCom '08.

[20]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[21]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[22]  Kevin Chen-Chuan Chang,et al.  URank: formulation and efficient evaluation of top-k queries in uncertain databases , 2007, SIGMOD '07.

[23]  Ambuj K. Singh,et al.  Top-k Spatial Joins of Probabilistic Objects , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Christian Böhm,et al.  The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[26]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[27]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[28]  Tomaz Klobucar,et al.  Privacy-Enhancing Technologies - approaches and development , 2003, Comput. Stand. Interfaces.

[29]  Kevin R. Fall,et al.  A delay-tolerant network architecture for challenged internets , 2003, SIGCOMM '03.

[30]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[31]  Susanne E. Hambrusch,et al.  Database Support for Probabilistic Attributes and Tuples , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[32]  Reynold Cheng,et al.  Efficient Evaluation of Imprecise Location-Dependent Queries , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[33]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[34]  Xiang Lian,et al.  Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data , 2009, The VLDB Journal.

[35]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[36]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[37]  Xiang Lian,et al.  Probabilistic Inverse Ranking Queries over Uncertain Data , 2009, DASFAA.

[38]  Ambuj K. Singh,et al.  APLA: Indexing Arbitrary Probability Distributions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[39]  Philip S. Yu,et al.  Time Series Compressibility and Privacy , 2007, VLDB.

[40]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[41]  Yu Zhang,et al.  Preserving User Location Privacy in Mobile Data Management Infrastructures , 2006, Privacy Enhancing Technologies.

[42]  Hari Balakrishnan,et al.  Cabernet: vehicular content delivery using WiFi , 2008, MobiCom '08.

[43]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[44]  Christian S. Jensen,et al.  Indexing the positions of continuously moving objects , 2000, SIGMOD '00.