A system for the static analysis of XPath

XPath is the standard language for navigating XML documents and returning a set of matching nodes. We present a sound and complete decision procedure for containment of XPath queries, as well as other related XPath decision problems such as satisfiability, equivalence, overlap, and coverage. The considered XPath fragment covers most of the language features used in practice. Specifically, we propose a unifying logic for XML, namely, the alternation-free modal μ-calculus with converse. We show how to translate major XML concepts such as XPath and regular XML types (including DTDs) into this logic. Based on these embeddings, we show how XPath decision problems, in the presence or absence of XML types, can be solved using a decision procedure for μ-calculus satisfiability. We provide a complexity analysis of our system together with practical experiments to illustrate the efficiency of the approach for realistic scenarios.

[1]  Maarten Marx,et al.  XPath with Conditional Axis Relations , 2004, EDBT.

[2]  John Doner,et al.  Tree Acceptors and Some of Their Applications , 1970, J. Comput. Syst. Sci..

[3]  Thomas Wilke,et al.  Automata logics, and infinite games: a guide to current research , 2002 .

[4]  Ulrike Sattler,et al.  BDD-Based Decision Procedures for K , 2002, CADE.

[5]  Serge Abiteboul,et al.  Regular Path Queries with Constraints , 1999, J. Comput. Syst. Sci..

[6]  Richard E. Ladner,et al.  Propositional Dynamic Logic of Regular Programs , 1979, J. Comput. Syst. Sci..

[7]  Peter T. Wood,et al.  Containment for XPath Fragments under DTD Constraints , 2003, ICDT.

[8]  Maarten Marx,et al.  Conditional XPath, the first order complete XPath dialect , 2004, PODS.

[9]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[10]  Moshe Y. Vardi Reasoning about The Past with Two-Way Automata , 1998, ICALP.

[11]  Franz Baader,et al.  The Inverse Method Implements the Automata Approach for Modal Satisfiability , 2001, IJCAR.

[12]  Benjamin C. Pierce,et al.  Regular expression types for XML , 2005, ACM Trans. Program. Lang. Syst..

[13]  Damian Niwinski,et al.  Fixed point characterization of weak monadic logic definable sets of trees , 1992, Tree Automata and Languages.

[14]  Michael Benedikt,et al.  Regular Tree Languages Definable in FO , 2005, STACS.

[15]  Massimo Franceschet XPathMark: An XPath Benchmark for the XMark Generated Data , 2005, XSym.

[16]  Paul J. Walmsley,et al.  XML Schema Part 0: Primer Second Edition , 2004 .

[17]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[18]  Dexter Kozen,et al.  A finite model theorem for the propositional μ-calculus , 1988, Stud Logica.

[19]  Peter T. Wood,et al.  On the Equivalence of XML Patterns , 2000, Computational Logic.

[20]  M. de Rijke,et al.  PDL for ordered trees , 2005, J. Appl. Non Class. Logics.

[21]  Robert K. Brayton,et al.  Early quantification and partitioned transition relations , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[22]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[23]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[24]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[25]  Thomas Schwentick,et al.  XPath query containment , 2004, SGMD.

[26]  James W. Thatcher,et al.  Generalized finite automata theory with an application to a decision problem of second-order logic , 1968, Mathematical systems theory.

[27]  Thomas Wilke,et al.  Automata Logics, and Infinite Games , 2002, Lecture Notes in Computer Science.

[28]  Frank Neven,et al.  Frontiers of tractability for typechecking simple XML transformations , 2004, PODS.

[29]  Edmund M. Clarke,et al.  Design and Synthesis of Synchronization Skeletons Using Branching Time Temporal Logic , 2008, 25 Years of Model Checking.

[30]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[31]  Thomas Schwentick,et al.  XPath Containment in the Presence of Disjunction, DTDs, and Variables , 2003, ICDT.

[32]  Edmund M. Clarke,et al.  Model Checking , 1999, Handbook of Automated Reasoning.

[33]  Masami Hagiya,et al.  A Decision Procedure for the Alternation-Free Two-Way Modal µ-Calculus , 2005, TABLEAUX.

[34]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[35]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[36]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[37]  Dan Suciu,et al.  Containment and equivalence for a fragment of XPath , 2004, JACM.

[38]  Pierre Genev,et al.  A System for the Static Analysis of XPath , 2006 .

[39]  Jean-Yves Vion-Dury,et al.  Logic-based XPath optimization , 2004, DocEng '04.

[40]  Michael Benedikt,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[41]  not Cwi,et al.  XHTML™ 1.0 The Extensible HyperText Markup Language , 2002 .

[42]  Pierre Genevès,et al.  XPath Formal Semantics and Beyond: a Coq based approach , 2004 .

[43]  Akihiko Tozawa Towards static type checking for XSLT , 2001, DocEng '01.

[44]  A. Grzegorczyk Some classes of recursive functions , 1964 .

[45]  Edmund M. Clarke,et al.  Design and Synthesis of Synchronization Skeletons Using Branching-Time Temporal Logic , 1981, Logic of Programs.

[46]  Wenfei Fan,et al.  Secure XML querying with security views , 2004, SIGMOD '04.

[47]  Benjamin C. Pierce,et al.  Type-Based Optimization for Regular Patterns , 2005, DBPL.

[48]  Benjamin C. Pierce,et al.  Regular expression types for XML , 2000, TOPL.

[49]  Stephan Merz,et al.  Model Checking , 2000 .

[50]  Joachim Hammer,et al.  Updatex---an xquery-based language for processing updates in xml , 2004 .

[51]  Pablo Barceló,et al.  Temporal logics over unranked trees , 2005, 20th Annual IEEE Symposium on Logic in Computer Science (LICS' 05).

[52]  Serge Abiteboul,et al.  Regular path queries with constraints , 1997, J. Comput. Syst. Sci..

[53]  Dexter Kozen,et al.  RESULTS ON THE PROPOSITIONAL’p-CALCULUS , 2001 .

[54]  Dexter Kozen,et al.  Results on the Propositional µ-Calculus , 1982, ICALP.

[55]  P. Wadler Two semantics for XPath , 2000 .

[56]  Orna Kupferman,et al.  The Weakness of Self-Complementation , 1999, STACS.

[57]  P. ed Hoschka,et al.  synchronized Multimedia Integration Language (SMIL) 1.0 Specification , 1998 .