论文信息 - Managing uncertainty using probabilistic databases

Managing uncertainty using probabilistic databases

Uncertainty is a fundamental problem underlying several modern database applications: exploratory queries in databases, data integration, querying information extracted from the Web, queries over sensor networks, scientific data management, reasoning about privacy breaches in data milling and many others. In this work, we describe probabilistic databases as a unifying framework to manage the various kinds of uncertainties that arise in these wide range of applications. In a probabilistic database, each data item has a probability of belonging to the database and queries return answers that are ranked by probabilities. We use possible worlds semantics to define precise semantics for queries over uncertain data. We consider three models for representing probabilistic databases: independent model, independent-disjoint model and random graph model. The independent and independent-disjoint models represent uncertainty by storing explicit probabilities in the database. We consider several database applications and show how the underlying uncertainty can be represented using these models. The main challenge here is query evaluation. Unlike in traditional databases, some queries have a #P-complete complexity. We do a detailed study of the complexity of queries and present algorithms and techniques for efficient query evaluation over probabilistic databases. The random graph model is useful for representing statistical information, where the uncertainty is not explicitly defined by probabilities. We show that random graphs serve as a powerful tool for studying the data privacy problem. We present several new results on the properties of random graphs and show their connection to uncertainty in databases.

Dan Suciu | Nilesh Dalvi