Cost Models for Nearest Neighbor Query Processing over Existentially Uncertain Spatial Data

A major challenge posed by real-world applications involving spatial information deals with the uncertainty inherent in the data. One type of uncertainty in spatial objects may come from their existence, which is expressed by a probability accompanying the spatial value of an object reflecting the confidence of the object's existence. A challenging query type over existentially uncertain data is the search of the Nearest Neighbour (NN), as the likelihood of an object to be the NN of the query object does not only depend on its distances from other objects, but also from their existence. In this paper, we present exact and approximate statistical methodologies for supporting cost models for Probabilistic Thresholding NN (PTNN) queries that deal with arbitrarily distributed data points and existential uncertainty, with the aid of appropriate novel histograms, sampling and statistical approximations. Our cost model can be also modified in order to support Probabilistic Ranking NN (PRNN) queries with the aid of sampling. The accuracy of our approaches is exhibited through extensive experimentation on synthetic and real datasets.

[1]  Sridhar Ramaswamy,et al.  Selectivity estimation in spatial databases , 1999, SIGMOD '99.

[2]  Yannis Manolopoulos,et al.  R-Trees: Theory and Applications , 2005, Advanced Information and Knowledge Processing.

[3]  Narayanaswamy Balakrishnan,et al.  Order statistics : applications , 1998 .

[4]  Stefano Spaccapietra,et al.  Semantic trajectories modeling and analysis , 2013, CSUR.

[5]  Cyrus Shahabi,et al.  The spatial skyline queries , 2006, VLDB.

[6]  Yannis Theodoridis,et al.  On the Effect of Location Uncertainty in Spatial Querying , 2009, IEEE Transactions on Knowledge and Data Engineering.

[7]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[8]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[9]  Yufei Tao,et al.  Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Nikos Pelekis,et al.  Cost Models and Efficient Algorithms on Existentially Uncertain Spatial Data , 2008, 2008 Panhellenic Conference on Informatics.

[11]  Chengyang Zhang,et al.  Advances in Spatial and Temporal Databases , 2015, Lecture Notes in Computer Science.

[12]  Yufei Tao,et al.  An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Yufei Tao,et al.  Probabilistic Spatial Queries on Existentially Uncertain Data , 2005, SSTD.