Ontology-Based Access to Probabilistic Data

We propose a framework for querying probabilistic data in the presence of an ontology, arguing that the interplay of probabilities and ontologies is fruitful in applications such as managing data that was extracted from the web. The prime inference problem is computing answer probabilities, and we show that it can be implemented using standard probabilistic database systems, similar to traditional ontologybased data access. We demonstrate that query rewriting into first-order logic is an important tool for our framework. First, it is used to establish a PTime vs. #P dichotomy for the data complexity of this problem by lifting a corresponding result from probabilistic databases. Then, we use it to characterize which pairs of query and TBox are in PTime. Finally, it is shown that non-existence of such a rewriting implies #P-hardness.

[1]  Umberto Straccia,et al.  Managing uncertainty and vagueness in description logics for the Semantic Web , 2008, J. Web Semant..

[2]  Dan Suciu,et al.  Computing query probability with incidence algebras , 2010, PODS '10.

[3]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Joseph Y. Halpern An Analysis of First-Order Logics of Probability , 1989, IJCAI.

[5]  Carsten Lutz,et al.  Non-Uniform Data Complexity of Query Answering in Description Logics , 2012, Description Logics.

[6]  Marco Laumanns,et al.  High‐confidence estimation of small s‐t reliabilities in directed acyclic networks , 2011, Networks.

[7]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[8]  David R. Karger,et al.  A randomized fully polynomial time approximation scheme for the all terminal network reliability problem , 1995, STOC '95.

[9]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[10]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[11]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[12]  Georg Gottlob,et al.  Conjunctive Query Answering in Probabilistic Datalog+/- Ontologies , 2011, RR.

[13]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[14]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[15]  Tim Furche,et al.  DIADEM: domain-centric, intelligent, automated data extraction methodology , 2012, WWW.

[16]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[17]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[18]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[19]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[20]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[21]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[22]  Fabio Gagliardi Cozman,et al.  Satisfiability in EL with Sets of Probabilistic ABoxes , 2011, Description Logics.

[23]  Carsten Lutz,et al.  The Combined Approach to Query Answering in DL-Lite , 2010, KR.

[24]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[25]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[26]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[27]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[28]  Carsten Lutz,et al.  Query Containment in Description Logics Reconsidered , 2012, KR.

[29]  Jean Christoph Jung,et al.  Ontology-Based Access to Probabilistic Data with OWL QL , 2012, SEMWEB.

[30]  Christopher Ré,et al.  Probabilistic databases , 2011, SIGA.

[31]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[32]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[33]  Umberto Straccia,et al.  Top-k retrieval for ontology mediated access to relational databases , 2012, Inf. Sci..

[34]  Carsten Lutz,et al.  Probabilistic Description Logics for Subjective Uncertainty , 2010, KR.

[35]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[36]  Benjamin Rossman,et al.  Homomorphism preservation theorems , 2008, JACM.

[37]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.