Logical definability and query languages over unranked trees

Unranked trees, that is, trees with no restriction on the number of children of nodes, have recently attracted much attention, primarily as an abstraction of XML (Extensible Markup Language) documents. In this paper, we study logical definability over unranked trees, as well as collections of unranked trees, that can be viewed as databases of XML documents. The traditional approach to definability is to view each tree as a structure of a fixed vocabulary, and study the expressive power of various logics on trees. A different approach, based on model theory, considers a structure whose universe is the set of all trees, and studies definable sets and relations; this approach extends smoothly to the setting of definability over collections of trees. We study the latter, model-theoretic approach. We find sets of operations on unranked trees that define regular tree languages, and show that some natural restrictions correspond to logics studied in the context of XML pattern languages. We then look at relational calculi over collections of unranked trees, and obtain quantifier-restriction results that give us bounds on the expressive power and complexity. As unrestricted relational calculi can express problems complete for each level of the polynomial hierarchy, we look at their restrictions, corresponding and find several calculi with low (NC/sup 1/) data complexity that can express important XML properties like DTD validation and XPath evaluation.

[1]  Anil Nerode,et al.  Automatic Presentations of Structures , 1994, LCC.

[2]  Thomas Schwentick,et al.  A model-theoretic approach to regular string relations , 2001, Proceedings 16th Annual IEEE Symposium on Logic in Computer Science.

[3]  Makoto Murata,et al.  Extended path expressions of XML , 2001, PODS.

[4]  Achim Blumensath,et al.  Automatic structures , 2000, Proceedings Fifteenth Annual IEEE Symposium on Logic in Computer Science (Cat. No.99CB36332).

[5]  Thomas Schwentick,et al.  Query automata over finite trees , 2002, Theor. Comput. Sci..

[6]  Jörg Flum,et al.  Finite model theory , 1995, Perspectives in Mathematical Logic.

[7]  Joachim Niehren,et al.  The first-order theory of ordering constraints over feature trees , 1998, Proceedings. Thirteenth Annual IEEE Symposium on Logic in Computer Science (Cat. No.98CB36226).

[8]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[9]  Victor Vianu,et al.  A Web Odyssey: from Codd to XML , 2001, PODS.

[10]  Wolfgang Thomas Logical Aspects in the Study of Tree Languages , 1984, CAAP.

[11]  Gabriel M. Kuper,et al.  Structural Properties of XPath Fragments , 2003, ICDT.

[12]  Heribert Vollmer,et al.  Introduction to Circuit Complexity , 1999, Texts in Theoretical Computer Science An EATCS Series.

[13]  Joachim Niehren,et al.  Ordering Constraints over Feature Trees Expressed in Second-Order Monadic Logic , 2000, Inf. Comput..

[14]  Benjamin C. Pierce,et al.  Regular expression pattern matching for XML , 2003, POPL '01.

[15]  Markus Lohrey On the Parallel Complexity of Tree Automata , 2001, RTA.

[16]  Derick Wood,et al.  Regular tree and regular hedge languages over unranked alphabets , 2001 .

[17]  Michael Benedikt,et al.  Tree extension algebras: logics, automata, and query languages , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[18]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[19]  Thomas Schwentick On Diving in Trees , 2000, MFCS.

[20]  Luca Cardelli,et al.  A Query Language Based on the Ambient Logic , 2001, SEBD.

[21]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[22]  Michael A. Taitslin,et al.  Finite Queries Do Not Have Effective Syntax , 1999, Inf. Comput..

[23]  Alain Quéré,et al.  Définition et Etude des Bilangages Réguliers , 1968, Inf. Control..

[24]  Derick Wood,et al.  Caterpillars: A Context Specification Technique , 2000, Markup languages.

[25]  Ludwig Staiger,et al.  Ω-languages , 1997 .

[26]  Thomas Schwentick,et al.  String operations in query languages , 2001, PODS '01.

[27]  Dan Suciu Typechecking for Semistructured Data , 2001, DBPL.

[28]  Luc Segoufin,et al.  Typing and querying XML documents: some complexity bounds , 2003, PODS.

[29]  Wolfgang Thomas,et al.  Languages, Automata, and Logic , 1997, Handbook of Formal Languages.

[30]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[31]  Thomas Schwentick,et al.  Expressive and efficient pattern languages for tree-structured data (extended abstract) , 2000, PODS '00.

[32]  C. Michaux,et al.  LOGIC AND p-RECOGNIZABLE SETS OF INTEGERS , 1994 .

[33]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[34]  Yuri Gurevich,et al.  The Classical Decision Problem , 1997, Perspectives in Mathematical Logic.

[35]  Gabriel M. Kuper,et al.  Structural properties of XPath fragments , 2003, Theor. Comput. Sci..

[36]  Masako Takahashi,et al.  Generalizations of Regular Sets and Their Applicatin to a Study of Context-Free Languages , 1975, Inf. Control..

[37]  Georg Gottlob,et al.  Monadic queries over tree-structured data , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.