Visibly pushdown automata for streaming XML

We propose the study of visibly pushdown automata (VPA) for processing XML documents. VPAs are pushdown automata where the input determines the stack operation, and XML documents are naturally visibly pushdown with the VPA pushing onto the stack on open-tags and popping the stack on close-tags. In this paper we demonstrate the power and ease visibly pushdown automata give in the design of streaming algorithms for XML documents. We study the problems of type-checking streaming XML documents against SDTD schemas, and the problem of typing tags in a streaming XML document according to an SDTD schema. For the latter problem, we consider both pre-order typing and post-order typing of a document, which dynamically determines types at open-tags and close-tags respectively as soon as they are met. We also generalize the problems of pre-order and post-order typing to prefix querying. We show that a deterministic VPA yields an algorithm to the problem of answering in one pass the set of all answers to any query that has the property that a node satisfying the query is determined solely by the prefix leading to the node. All the streaming algorithms we develop in this paper are based on the construction of deterministic VPAs, and hence, for any fixed problem, the algorithms process each element of the input in constant time, and use space (d), where d is the depth of the document.

[1]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[2]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[3]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[4]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[5]  Bertram Ludäscher,et al.  A Transducer-Based XML Query Processor , 2002, VLDB.

[6]  Thomas Colcombet,et al.  Tree-Walking Automata Do Not Recognize All Regular Languages , 2008, SIAM J. Comput..

[7]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[8]  Thomas Schwentick,et al.  On the power of tree-walking automata , 2000, Inf. Comput..

[9]  Nicole Schweikardt,et al.  Lower bounds for sorting with few random accesses to external memory , 2005, PODS.

[10]  R. Alur,et al.  Adding nesting structure to words , 2006, JACM.

[11]  Victor Vianu,et al.  XML: From Practice to Theory , 2003, SBBD.

[12]  Alfred V. Aho,et al.  Translations on a Context-Free Grammar , 1971, Inf. Control..

[13]  Victor Vianu,et al.  Validating streaming XML documents , 2002, PODS.

[14]  Mahesh Viswanathan,et al.  Minimization, Learning, and Conformance Testing of Boolean Programs , 2006, CONCUR.

[15]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[16]  Thomas Schwentick,et al.  Which XML Schemas Admit 1-Pass Preorder Typing? , 2005, ICDT.

[17]  Stefanie Scherzinger,et al.  Attribute grammars for scalable query processing on XML streams , 2003, The VLDB Journal.

[18]  Helmut Seidl,et al.  Locating Matches of Tree Patterns in Forests , 1998, FSTTCS.

[19]  Mahesh Viswanathan,et al.  Congruences for Visibly Pushdown Languages , 2005, ICALP.

[20]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[21]  Rajeev Alur,et al.  Visibly pushdown languages , 2004, STOC '04.

[22]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[23]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[24]  Corin Pitcher Visibly Pushdown Expression Effects for XML Stream Processing , 2004 .