Efficient query evaluation on probabilistic databases

We describe a framework for supporting arbitrarily complex SQL queries with “uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is query evaluation. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.

[1]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[2]  J. Scott Provan,et al.  The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected , 1983, SIAM J. Comput..

[3]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[4]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[5]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  Fereidoon Sadri,et al.  Reliability of Answers to Queries in Relational Databases , 1991, IEEE Trans. Knowl. Data Eng..

[8]  V. S. Subrahmanian,et al.  Probabilistic Logic Programming , 1992, Inf. Comput..

[9]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[10]  Fereidoon Sadri,et al.  Aggregate Operations in the Information Source Tracking Method , 1992, Theor. Comput. Sci..

[11]  Ronald Fagin,et al.  Reasoning about knowledge and probability , 1988, JACM.

[12]  Fereidoon Sadri,et al.  Information Source Tracking Method: Efficiency Issues , 1995, IEEE Trans. Knowl. Data Eng..

[13]  Fereidoon Sadri,et al.  Integrity Constraints in the Information Source Tracking Method , 1995, IEEE Trans. Knowl. Data Eng..

[14]  Justin Zobel,et al.  Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.

[15]  Joseph Y. Halpern,et al.  From Statistical Knowledge Bases to Degrees of Belief , 1996, Artif. Intell..

[16]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[17]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[18]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[19]  Esteban Zimányi,et al.  Query Evaluation in Probabilistic Relational Databases , 1997, Theor. Comput. Sci..

[20]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[21]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[22]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[23]  Thomas Lukasiewicz,et al.  Probabilistic Logic Programming , 1998, ECAI.

[24]  Yuri Gurevich,et al.  The complexity of query reliability , 1998, PODS.

[25]  Thomas Lukasiewicz,et al.  Probabilistic object bases , 2001, TODS.

[26]  M. Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[27]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[28]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[29]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[31]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[32]  Gerhard Weikum,et al.  The XXL search engine: ranked retrieval of XML data using indexes and ontologies , 2002, SIGMOD '02.

[33]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[34]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[35]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[36]  V. S. Subrahmanian,et al.  Probabilistic Interval XML , 2003, ICDT.

[37]  Norbert Fuhr,et al.  Combining DAML+OIL, XSLT, and Probabilistic Logics for Uncertain Schema Mappings in MIND , 2003, ECDL.

[38]  V. S. Subrahmanian,et al.  PXML: a probabilistic semistructured data model and algebra , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[39]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB journal.

[40]  Robert B. Ross,et al.  Aggregate operators in probabilistic databases , 2005, JACM.

[41]  Christopher Ré,et al.  Query Evaluation on Probabilistic Databases , 2006, IEEE Data Eng. Bull..