Probabilistic Group Nearest Neighbor Queries in Uncertain Databases

The importance of query processing over uncertain data has recently arisen due to its wide usage in many real-world applications. In the context of uncertain databases, previous works have studied many query types such as nearest neighbor query, range query, top-k query, skyline query, and similarity join. In this paper, we focus on another important query, namely, probabilistic group nearest neighbor (PGNN) query, in the uncertain database, which also has many applications. Specifically, given a set, Q, of query points, a PGNN query retrieves data objects that minimize the aggregate distance (e.g., sum, min, and max) to query set Q. Due to the inherent uncertainty of data objects, previous techniques to answer group nearest neighbor (GNN) query cannot be directly applied to our PGNN problem. Motivated by this, we propose effective pruning methods, namely, spatial pruning and probabilistic pruning, to reduce the PGNN search space, which can be seamlessly integrated into our PGNN query procedure. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approach, in terms of the wall clock time and the speed-up ratio against linear scan.

[1]  Ambuj K. Singh,et al.  APLA: Indexing Arbitrary Probability Distributions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[3]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[4]  Yufei Tao,et al.  Reverse Nearest Neighbor Search in Metric Spaces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[7]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[8]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[9]  Christian Böhm,et al.  The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[11]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[12]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[13]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[15]  V. S. Subrahmanian,et al.  RDF aggregate queries and views , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[17]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[18]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[19]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[22]  Yufei Tao,et al.  Multidimensional reverse kNN search , 2007, The VLDB Journal.

[23]  Philippe Bonnet,et al.  GADT: a probability space ADT for representing and querying the physical world , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[25]  Kyriakos Mouratidis,et al.  Aggregate nearest neighbor queries in spatial databases , 2005, TODS.

[26]  Kyriakos Mouratidis,et al.  Group nearest neighbor queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[27]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.