Query automata

It is common to model structured document databases by context-free and extended context-free grammars. A crucial difference is that the derivation trees of the former are ranked, while those of the latter are not. A main task in document transformation and information retrieval is locating subtrees satisfying some pattern. Therefore, unary queries, i.e., queries that map a tree to a set of its nodes, play an important role in the context of structured document databases. We want to understand how the natural and well-studied computation model of tree automata can be used to express such queries. We define a query automaton (QA) as a deterministic two-way finite automaton over trees that has the ability to select nodes depending on the state and the label at those nodes. We study QAs over ranked as well as over unranked trees. More precisely, we characterize the expressiveness of the different formalisms by linking them to monadic second-order logic, and we establish the complexity of their non-emptiness and equivalence problem.

[1]  Jeffrey D. Ullman,et al.  An Approach to a Unified Theory of Automata , 1967, SWAT.

[2]  J. Hopcroft,et al.  An Approach to a Unified Theory of Automata , 1967, SWAT.

[3]  Alain Quéré,et al.  Définition et Etude des Bilangages Réguliers , 1968, Inf. Control..

[4]  Masako Takahashi,et al.  Generalizations of Regular Sets and Their Applicatin to a Study of Context-Free Languages , 1975, Inf. Control..

[5]  Editors , 1986, Brain Research Bulletin.

[6]  Gaston H. Gonnet,et al.  Mind Your Grammar: a New Approach to Modelling Text , 1987, VLDB.

[7]  Moshe Y. Vardi Invited talk: automata theory for database theoreticians , 1989, PODS '89.

[8]  Marc Gyssens,et al.  A grammar-based approach towards unifying hierarchical data models , 1989, SIGMOD '89.

[9]  Y. Gurevich On Finite Model Theory , 1990 .

[10]  Moshe Y. Vardi Automata Theory for Database Theoreticans , 1991, Theoretical Studies in Computer Science.

[11]  Heikki Mannila,et al.  Retrieval from hierarchical texts by partial patterns , 1993, SIGIR.

[12]  Marc Gyssens,et al.  A grammar-based approach towards unifying hierarchical data models , 1989, SIGMOD '89.

[13]  Heikki Mannila,et al.  Query Primitives for Tree-Structured Data , 1994, CPM.

[14]  Etsuro Moriya,et al.  On Two-Way Tree Automata , 1994, Inf. Process. Lett..

[15]  Derick Wood,et al.  Standard Generalized Markup Language: Mathematical and Philosophical Issues , 1995, Computer Science Today.

[16]  Masaki Murata,et al.  Forest-regular languages and tree-regular languages , 1995 .

[17]  David Harel,et al.  Complexity Results for Two-Way and Multi-Pebble Automata and their Logics , 1996, Theor. Comput. Sci..

[18]  Ricardo A. Baeza-Yates,et al.  Integrating contents and structure in text retrieval , 1996, SGMD.

[19]  Wolfgang Thomas,et al.  Languages, Automata, and Logic , 1997, Handbook of Formal Languages.

[20]  Ferenc Gécseg,et al.  Tree Languages , 1997, Handbook of Formal Languages.

[21]  Grzegorz Rozenberg,et al.  Handbook of formal languages, vol. 3: beyond words , 1997 .

[22]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[23]  Helmut Seidl,et al.  Locating Matches of Tree Patterns in Forests , 1998, FSTTCS.

[24]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[25]  Frank Neven,et al.  Expressiveness of structured document query languages based on attribute grammars , 1998, JACM.

[26]  Serge Abiteboul,et al.  A logical view of structured files , 1998, The VLDB Journal.

[27]  Derick Wood,et al.  Regular Tree Languages Over Non-Ranked Alphabets , 1998 .

[28]  Makoto Murata Data Model for Document Transformation and Assembly , 1998, PODDP.

[29]  Frank Neven,et al.  Extensions of Attribute Grammars for Structured Document Queries , 1999, DBPL.

[30]  Catriel Beeri,et al.  Schemas for Integration and Translation of Structured and Semi-structured Data , 1999, ICDT.