Query Evaluation on Probabilistic Databases

We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is query evaluation. We describe an optimization algorithm that can compute eciently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any ecient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.

[1]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[2]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[3]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[4]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Gerhard Weikum,et al.  The XXL search engine: ranked retrieval of XML data using indexes and ontologies , 2002, SIGMOD '02.

[6]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[7]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[8]  Yuri Gurevich,et al.  The complexity of query reliability , 1998, PODS.

[9]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[10]  Joseph Y. Halpern,et al.  From Statistical Knowledge Bases to Degrees of Belief , 1996, Artif. Intell..

[11]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[12]  Justin Zobel,et al.  Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.

[13]  Mechthild Stoer,et al.  A Simple Min Cut Algorithm , 1994, ESA.

[14]  V. S. Subrahmanian,et al.  Probabilistic Logic Programming , 1992, Inf. Comput..

[15]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[16]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[17]  Ronald Fagin,et al.  Reasoning about knowledge and probability , 1988, JACM.

[18]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[19]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[20]  J. Scott Provan,et al.  The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected , 1983, SIAM J. Comput..

[21]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[22]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[23]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.