Provenance Circuits for Trees and Treelike Instances

Query evaluation in monadic second-order logic (MSO) is tractable on trees and treelike instances, even though it is hard for arbitrary instances. This tractability result has been extended to several tasks related to query evaluation, such as counting query results [3] or performing query evaluation on probabilistic trees [10]. These are two examples of the more general problem of computing augmented query output, that is referred to as provenance. This article presents a provenance framework for trees and treelike instances, by describing a linear-time construction of a circuit provenance representation for MSO queries. We show how this provenance can be connected to the usual definitions of semiring provenance on relational instances [20], even though we compute it in an unusual way, using tree automata; we do so via intrinsic definitions of provenance for general semirings, independent of the operational details of query evaluation. We show applications of this provenance to capture existing counting and probabilistic results on trees and treelike instances, and give novel consequences for probability evaluation.

[1]  James W. Thatcher,et al.  Generalized finite automata theory with an application to a decision problem of second-order logic , 1968, Mathematical systems theory.

[2]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[3]  Thomas Schwentick,et al.  Query automata over finite trees , 2002, Theor. Comput. Sci..

[4]  Thomas Eiter,et al.  Query answering in description logics with transitive roles , 2009, IJCAI 2009.

[5]  Pierre Senellart,et al.  Probabilistic XML: Models and Complexity , 2013, Advances in Probabilistic Databases for Uncertain Information Management.

[6]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[7]  Dan Olteanu,et al.  MayBMS: a probabilistic database management system , 2009, SIGMOD Conference.

[8]  Albert R. Meyer,et al.  WEAK MONADIC SECOND ORDER THEORY OF SUCCESSOR IS NOT ELEMENTARY-RECURSIVE , 1973 .

[9]  Jörg Flum,et al.  Query evaluation via tree-decompositions , 2001, JACM.

[10]  B. A. Reed,et al.  Algorithmic Aspects of Tree Width , 2003 .

[11]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[12]  Sebastian Rudolph,et al.  Flag & check: data access with monadically defined queries , 2013, PODS '13.

[13]  B. Mohar,et al.  Graph Minors , 2009 .

[14]  Christopher Ré,et al.  Probabilistic databases , 2011, SIGA.

[15]  Jaehong Park,et al.  A provenance-based access control model , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[16]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[17]  Dan Suciu,et al.  Query containment for conjunctive queries with regular expressions , 1998, PODS.

[18]  Surajit Chaudhuri,et al.  On the equivalence of recursive and nonrecursive datalog programs , 1992, J. Comput. Syst. Sci..

[19]  Stijn Heymans,et al.  DReW: a Reasoner for Datalog-rewritable Description Logics and DL-Programs , 2010 .

[20]  Balder ten Cate,et al.  Queries with Guarded Negation , 2012, Proc. VLDB Endow..

[21]  Ronald Fagin,et al.  The closure of Monadic NP (extended abstract) , 1998, STOC '98.

[22]  Stefan Rümmele,et al.  Counting and Enumeration Problems with Bounded Treewidth , 2010, LPAR.

[23]  Detlef Seese,et al.  Easy Problems for Tree-Decomposable Graphs , 1991, J. Algorithms.

[24]  Andrea Calì,et al.  Taming the Infinite Chase: Query Answering under Expressive Relational Constraints , 2008, Description Logics.

[25]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[26]  Antonella Poggi,et al.  On database query languages for K-relations , 2010, J. Appl. Log..

[27]  Oded Shmueli,et al.  Decidability and expressiveness aspects of logic queries , 1987, XP7.52 Workshop on Database Theory.

[28]  Sebastian Rudolph,et al.  Schema-Agnostic Query Rewriting in SPARQL 1.1 , 2014, International Semantic Web Conference.

[29]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[30]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[31]  Daniel Deutch,et al.  Circuits for Datalog Provenance , 2014, ICDT.

[32]  Diego Calvanese,et al.  Decidable containment of recursive queries , 2003, Theor. Comput. Sci..

[33]  Marijke H. L. Bodlaender Probabilistic Inference and Monadic Second Order Logic , 2012, IFIP TCS.

[34]  Jean-François Baget,et al.  On rules with existential variables: Walking the decidability line , 2011, Artif. Intell..

[35]  Michael Benedikt,et al.  Monadic Datalog Containment , 2012, ICALP.

[36]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[37]  Bruno Courcelle,et al.  The Monadic Second-Order Logic of Graphs X: Linear Orderings , 1996, Theor. Comput. Sci..

[38]  Markus Krötzsch Efficient Rule-Based Inferencing for OWL EL , 2011, IJCAI.

[39]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[40]  Peter Lammich,et al.  Tree Automata , 2009, Arch. Formal Proofs.

[41]  Bruno Courcelle,et al.  On the fixed parameter complexity of graph enumeration problems definable in monadic second-order logic , 2001, Discret. Appl. Math..

[42]  Surajit Chaudhuri,et al.  On the complexity of equivalence between recursive and nonrecursive Datalog programs , 1994, PODS '94.

[43]  Bruno Courcelle,et al.  The Monadic Second-Order Logic of Graphs. I. Recognizable Sets of Finite Graphs , 1990, Inf. Comput..

[44]  Yehoshua Sagiv,et al.  Running tree automata on probabilistic XML , 2009, PODS.

[45]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[46]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[47]  Juan L. Reutter Containment of Nested Regular Expressions , 2013, ArXiv.

[48]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[49]  F. Gavril The intersection graphs of subtrees in tree are exactly the chordal graphs , 1974 .

[50]  Diego Calvanese,et al.  Regular Path Queries in Expressive Description Logics with Nominals , 2009, IJCAI.

[51]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[52]  Alin Deutsch,et al.  Optimization Properties for Classes of Conjunctive Regular Path Queries , 2001, DBPL.

[53]  Yehoshua Sagiv,et al.  Query efficiency in probabilistic XML models , 2008, SIGMOD Conference.

[54]  Lise Getoor,et al.  Read-once functions and query evaluation in probabilistic databases , 2010, Proc. VLDB Endow..

[55]  Val Tannen,et al.  Faster query answering in probabilistic databases using read-once functions , 2010, ICDT '11.

[56]  Christopher Ré,et al.  Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization , 2007, VLDB.

[57]  Jean-François Baget,et al.  Walking the Decidability Line for Rules with Existential Variables , 2010, KR.

[58]  Dan Suciu,et al.  Bridging the gap between intensional and extensional query evaluation in probabilistic databases , 2010, EDBT '10.

[59]  Diego Calvanese,et al.  Reasoning on regular path queries , 2003, SGMD.

[60]  Hans L. Bodlaender A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC '93.

[61]  Bruno Courcelle,et al.  Recursive Queries and Context-free Graph Grammars , 1991, Theor. Comput. Sci..

[62]  Diego Calvanese,et al.  Answering Regular Path Queries in Expressive Description Logics: An Automata-Theoretic Approach , 2007, AAAI.

[63]  Erich Grädel Efficient Evaluation Methods for Guarded Logics and Datalog LITE , 2000, LPAR.

[64]  Haim Gaifman,et al.  Decidable optimization problems for database logic programs , 1988, STOC '88.

[65]  Mikolaj Bojanczyk,et al.  Transducers with Origin Information , 2013, ICALP.

[66]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[67]  Martin Otto,et al.  Back and forth between guarded and modal logics , 2002, TOCL.

[68]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[69]  Diego Calvanese,et al.  Nested Regular Path Queries in Description Logics , 2014, KR.

[70]  Val Tannen,et al.  Annotated XML: queries and provenance , 2008, PODS.

[71]  Daniel Deutch,et al.  On the Limitations of Provenance for Queries with Difference , 2011, TaPP.

[72]  Dan Suciu,et al.  On the tractability of query compilation and bounded treewidth , 2012, ICDT '12.

[73]  Serge Abiteboul,et al.  Regular Path Queries with Constraints , 1999, J. Comput. Syst. Sci..