Querying and Cleaning Uncertain Data

The management of uncertainty in large databases has recently attracted tremendous research interest. Data uncertainty is inherent in many emerging and important applications, including locationbased services, wireless sensor networks, biometric and biological databases, and data stream applications. In these systems, it is important to manage data uncertainty carefully, in order to make correct decisions and provide high-quality services to users. To enable the development of these applications, uncertain database systems have been proposed. They consider data uncertainty as a "first-class citizen", and use generic data models to capture uncertainty, as well as provide query operators that return answers with statistical confidences. We summarize our work on uncertain databases in recent years. We explain how data uncertainty can be modeled, and present a classification of probabilistic queries (e.g., range query and nearest-neighbor query). We further study how probabilistic queries can be efficiently evaluated and indexed. We also highlight the issue of removing uncertainty under a stringent cleaning budget, with an attempt of generating high-quality probabilistic answers.

[1]  Reynold Cheng,et al.  Efficient Evaluation of Imprecise Location-Dependent Queries , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[3]  Ambuj K. Singh,et al.  APLA: Indexing Arbitrary Probability Distributions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[5]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[6]  Mukesh K. Mohania,et al.  Advances in Databases: Concepts, Systems and Applications , 2007 .

[7]  Dan Olteanu,et al.  Query language support for incomplete information in the MayBMS system , 2007, VLDB.

[8]  A. Prasad Sistla,et al.  Querying the Uncertain Position of Moving Objects , 1997, Temporal Databases, Dagstuhl.

[9]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[11]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[12]  Xike Xie,et al.  Cleaning uncertain data with quality guarantees , 2008, Proc. VLDB Endow..

[13]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[14]  V. S. Subrahmanian,et al.  A Logical Formulation of Probabilistic Spatial Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Yufei Tao,et al.  Probabilistic Spatial Queries on Existentially Uncertain Data , 2005, SSTD.

[16]  Heinrich Müller,et al.  Effiziente Methoden der geometrischen Modellierung und der wissenschaftlichen Visualisierung, Dagstuhl Seminar 1997 , 1999, Effiziente Methoden der geometrischen Modellierung und der wissenschaftlichen Visualisierung.

[17]  Sushil Jajodia,et al.  Temporal Databases: Research and Practice , 1998 .

[18]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Susanne E. Hambrusch,et al.  Database Support for Probabilistic Attributes and Tuples , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[20]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[21]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[22]  Jeffrey Scott Vitter,et al.  Efficient join processing over uncertain data , 2006, CIKM '06.

[23]  Dieter Pfoser,et al.  Capturing the Uncertainty of Moving-Object Representations , 1999, SSD.

[24]  Christian Böhm,et al.  The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Yufei Tao,et al.  Range search on multidimensional uncertain data , 2007, TODS.

[26]  Sharad Mehrotra,et al.  Approximate selection queries over imprecise data , 2004, Proceedings. 20th International Conference on Data Engineering.

[27]  Reynold Cheng,et al.  Quality-Aware Probing of Uncertain Data with Resource Constraints , 2008, SSDBM.

[28]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[29]  Reynold Cheng,et al.  Evaluating probability threshold k-nearest-neighbor queries over uncertain data , 2009, EDBT '09.

[30]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[31]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.