Model Theory of XPath on Data Trees. Part I: Bisimulation and Characterization

We investigate model theoretic properties of XPath with data (in)equality tests over the class of data trees, i.e., the class of trees where each node contains a label from a finite alphabet and a data value from an infinite domain. We provide notions of (bi)simulations for XPath logics containing the child, parent, ancestor and descendant axes to navigate the tree. We show that these notions precisely characterize the equivalence relation associated with each logic. We study formula complexity measures consisting of the number of nested axes and nested subformulas in a formula; these notions are akin to the notion of quantifier rank in first-order logic. We show characterization results for fine grained notions of equivalence and (bi)simulation that take into account these complexity measures. We also prove that positive fragments of these logics correspond to the formulas preserved under (non-symmetric) simulations. We show that the logic including the child axis is equivalent to the fragment of first-order logic invariant under the corresponding notion of bisimulation. If upward navigation is allowed the characterization fails but a weaker result can still be established. These results hold both over the class of possibly infinite data trees and over the class of finite data trees. Besides their intrinsic theoretical value, we argue that bisimulations are useful tools to prove (non)expressivity results for the logics studied here, and we substantiate this claim with examples.

[1]  Marcin Jurdzinski,et al.  Alternating automata on data trees and XPath satisfiability , 2008, TOCL.

[2]  Davide Sangiorgi,et al.  On the origins of bisimulation and coinduction , 2009, TOPL.

[3]  Michael Benedikt,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[4]  David Park,et al.  Concurrency and Automata on Infinite Sequences , 1981, Theoretical Computer Science.

[5]  Martin Otto,et al.  Bisimulation invariance and finite models , 2006 .

[6]  Mikolaj Bojanczyk,et al.  XPath evaluation in linear time , 2011, JACM.

[7]  Richard Spencer-Smith,et al.  Modal Logic , 2007 .

[8]  Serge Abiteboul,et al.  Recursive queries on trees and data trees , 2013, ICDT '13.

[9]  George H. L. Fletcher,et al.  Structural characterizations of the semantics of XPath as navigation tool on a document , 2006, PODS.

[10]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[11]  Diego Figueira,et al.  Reasoning on words and trees with data , 2010 .

[12]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[13]  Eric Rosen,et al.  Modal Logic over Finite Structures , 1997, J. Log. Lang. Inf..

[14]  Claire David,et al.  Complexity of Data Tree Patterns over XML Documents , 2008, MFCS.

[15]  Valentin Goranko,et al.  Model theory of modal logic , 2007, Handbook of Modal Logic.

[16]  Maarten Marx,et al.  XPath with Conditional Axis Relations , 2004, EDBT.

[17]  Santiago Figueira,et al.  Model theory of XPath on data trees. Part II: Binary bisimulation and definability , 2017, Inf. Comput..

[18]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[19]  F. Honsell,et al.  Set theory with free construction principles , 1983 .

[20]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[21]  Scott A. Smolka,et al.  CCS expressions, finite state processes, and three problems of equivalence , 1983, PODC '83.

[22]  Martin Otto Modal and guarded characterisation theorems over finite transition systems , 2004, Ann. Pure Appl. Log..

[23]  Maarten de Rijke,et al.  Simulating Without Negation , 1997, J. Log. Comput..

[24]  Thomas Schwentick,et al.  Two-variable logic on data trees and XML reasoning , 2009, JACM.

[25]  Dan Olteanu,et al.  Forward node-selecting queries over trees , 2007, TODS.

[26]  Mikolaj Bojanczyk,et al.  XPath evaluation in linear time , 2008, PODS.

[27]  Diego Figueira,et al.  Bottom-up automata on data trees and vertical XPath , 2011, Log. Methods Comput. Sci..

[28]  Robert Piro,et al.  Description Logic TBoxes: Model-Theoretic Characterizations and Rewritability , 2011, IJCAI.

[29]  Santiago Figueira,et al.  Definability for Downward and Vertical XPath on Data Trees , 2014, WoLLIC.

[30]  Agostino Dovier,et al.  An efficient algorithm for computing bisimulation equivalence , 2004, Theor. Comput. Sci..

[31]  Jerzy Tiuryn,et al.  Dynamic logic , 2001, SIGA.

[32]  J.F.A.K. van Benthem,et al.  Modal Correspondence Theory , 1977 .

[33]  Thomas Schwentick,et al.  Finite state machines for strings over infinite alphabets , 2004, TOCL.

[34]  M. de Rijke,et al.  Semantic characterizations of navigational XPath , 2005, SGMD.

[35]  Leonid Libkin,et al.  Pattern logics and auxiliary relations , 2014, CSL-LICS.

[36]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[37]  Balder ten Cate,et al.  Some modal aspects of XPath , 2010, J. Appl. Non Class. Logics.

[38]  John E. Hopcroft,et al.  An n log n algorithm for minimizing states in a finite automaton , 1971 .

[39]  Santiago Figueira,et al.  On the Size of Shortest Modal Descriptions , 2010, Advances in Modal Logic.

[40]  Diego Figueira,et al.  Decidability of Downward XPath , 2012, TOCL.

[41]  Santiago Figueira,et al.  Basic Model Theory of XPath on Data Trees , 2014, ICDT.

[42]  Michael Benedikt,et al.  XPath leashed , 2009, CSUR.

[43]  Anuj Dawar,et al.  Modal characterisation theorems over special classes of frames , 2005, 20th Annual IEEE Symposium on Logic in Computer Science (LICS' 05).