Effectively Indexing the Uncertain Space

With the rapid development of various optical, infrared, and radar sensors and GPS techniques, there are a huge amount of multidimensional uncertain data collected and accumulated everyday. Recently, considerable research efforts have been made in the field of indexing, analyzing, and mining uncertain data. As shown in a recent book on uncertain data, in order to efficiently manage and mine uncertain data, effective indexing techniques are highly desirable. Based on the observation that the existing index structures for multidimensional data are sensitive to the size or shape of uncertain regions of uncertain objects and the queries, in this paper, we introduce a novel R-Tree-based inverted index structure, named UI-Tree, to efficiently support various queries including range queries, similarity joins, and their size estimation, as well as top-k range query, over multidimensional uncertain objects against continuous or discrete cases. Comprehensive experiments are conducted on both real data and synthetic data to demonstrate the efficiency of our techniques.

[1]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[2]  Christian Böhm,et al.  Probabilistic Ranking Queries on Gaussians , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[3]  Bir Bhanu,et al.  Uncertain spatial data handling: Modeling, indexing and query , 2007, Comput. Geosci..

[4]  Susanne E. Hambrusch,et al.  Indexing Uncertain Categorical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Dong-Suk Hong,et al.  UR-Tree: An Efficient Index for Uncertain Data in Ubiquitous Sensor Networks , 2007, GPC.

[6]  Yufei Tao,et al.  Probabilistic Spatial Queries on Existentially Uncertain Data , 2005, SSTD.

[7]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[8]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[9]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[10]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[12]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[13]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[14]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Michael Gertz,et al.  Modeling and Querying Vague Spatial Objects Using Shapelets , 2007, VLDB.

[16]  Zhiming Ding,et al.  UTR-Tree: An Index Structure for the Full Uncertain Trajectories of Network-Constrained Moving Objects , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[17]  Christian Böhm,et al.  ProVeR: Probabilistic Video Retrieval using the Gauss-Tree , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Yukio Sadahiro Buffer Operation on Spatial Data with Limited Accuracy , 2005, Trans. GIS.

[19]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[20]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[21]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[22]  Dmitri V. Kalashnikov,et al.  Toward Managing Uncertain Spatial Information for Situational Awareness Applications , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  Yufei Tao,et al.  Range search on multidimensional uncertain data , 2007, TODS.

[24]  Philip S. Yu,et al.  On High Dimensional Indexing of Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[26]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[27]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[28]  Reynold Cheng,et al.  Efficient Evaluation of Imprecise Location-Dependent Queries , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[29]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[30]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[31]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[32]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[33]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[34]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[35]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[36]  Subhash Suri,et al.  Finding tailored partitions , 1989, SCG '89.

[37]  Christos Faloutsos,et al.  Analysis of object oriented spatial access methods , 1987, SIGMOD '87.