URDF: Efficient Reasoning in Uncertain RDF Knowledge Bases with Soft and Hard Rules

We present URDF, an efficient reasoning framework for graph-based, nonschematic RDF knowledge bases and SPARQL-like queries. URDF augments first-order reasoning by a combination of soft rules, with Datalog-style recursive implications, and hard rules, in the shape of mutually exclusive sets of facts. It incorporates the common possible worlds semantics with independent base facts as it is prevalent in most probabilistic database approaches, but also supports semantically more expressive, probabilistic first-order representations such as Markov Logic Networks. As knowledge extraction on theWeb often is an iterative (and inherently noisy) process, URDF explicitly targets the resolution of inconsistencies between the underlying RDF base facts and the inference rules. Core of our approach is a novel and efficient approximation algorithm for a generalized version of the Weighted MAX-SAT problem, allowing us to dynamically resolve such inconsistencies directly at query processing time. Our MAX-SAT algorithm has a worst-case running time of O(jCjjSj), where jCj and jSj denote the number of facts in grounded soft and hard rules, respectively, and it comes with tight approximation guarantees with respect to the shape of the rules and the distribution of confidences of facts they contain. Experiments over various benchmark settings confirm a high robustness and significantly improved runtime of our reasoning framework in comparison to state-of-the-art techniques for MCMC sampling such as MAP inference and MC-SAT. Keywords

[1]  Krzysztof R. Apt,et al.  Contributions to the Theory of Logic Programming , 1982, JACM.

[2]  Brigitte Jaumard,et al.  On the Complexity of the Maximum Satisfiability Problem for Horn Formulas , 1987, Inf. Process. Lett..

[3]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[4]  David P. Williamson,et al.  New 3/4-Approximation Algorithms for the Maximum Satisfiability Problem , 1994, SIAM J. Discret. Math..

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[7]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[10]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[11]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[12]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[13]  Pedro M. Domingos,et al.  Memory-Efficient Inference in Relational Domains , 2006, AAAI.

[14]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[15]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[16]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[17]  Norbert Fuhr,et al.  Adding Probabilities and Rules to Owl Lite Subsets Based on Probabilistic Datalog , 2006, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Peter Baumgartner,et al.  Automated Reasoning Support for First-Order Ontologies , 2006, PPSWR.

[19]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[20]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Raghu Ramakrishnan,et al.  Optimizing mpf queries: decision support and probabilistic inference , 2007, SIGMOD '07.

[22]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[23]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[24]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[25]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[27]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[28]  Dan Suciu,et al.  Query evaluation with soft-key constraints , 2008, PODS.

[29]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[30]  James Fogarty,et al.  Intelligence in Wikipedia , 2008, AAAI.

[31]  Pedro M. Domingos,et al.  A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC , 2008, AAAI.

[32]  Prashant J. Shenoy,et al.  Probabilistic Inference over RFID Streams in Mobile Environments , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[33]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.