On the complexity of typechecking top-down XML transformations

We investigate the typechecking problem for XML transformations: statically verifying that every answer to a transformation conforms to a given output schema, for inputs satisfying a given input schema. As typechecking quickly turns undecidable for query languages capable of testing equality of data values, we return to the limited framework where we abstract XML documents as labeled ordered trees. We focus on simple top-down recursive transformations motivated by XSLT and structural recursion on trees. We parameterize the problem by several restrictions on the transformations (deleting, non-deleting, bounded width) and consider both tree automata and DTDs as input and output schemas. The complexity of the typechecking problems in this scenario ranges from PTIME to EXPTIME.

[1]  Akihiko Tozawa Towards static type checking for XSLT , 2001, DocEng '01.

[2]  Ferenc Gécseg,et al.  Tree Languages , 1997, Handbook of Formal Languages.

[3]  Stephen A. Cook,et al.  An observation on time-storage trade off , 1973, J. Comput. Syst. Sci..

[4]  Derick Wood,et al.  Regular tree and regular hedge languages over unranked alphabets , 2001 .

[5]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[6]  Haim Gaifman,et al.  Decidable optimization problems for database logic programs , 1988, STOC '88.

[7]  Dan Suciu Typechecking for Semistructured Data , 2001, DBPL.

[8]  Frank Neven,et al.  Frontiers of tractability for typechecking simple XML transformations , 2004, PODS.

[9]  Joost Engelfriet,et al.  Macro Tree Transducers , 1985, J. Comput. Syst. Sci..

[10]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[11]  Richard J. Lipton,et al.  Alternating Pushdown and Stack Automata , 1984, SIAM J. Comput..

[12]  Dan Suciu,et al.  Type inference for queries on semistructured data , 1999, PODS '99.

[13]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[14]  Noga Alon,et al.  XML with data values: typechecking revisited , 2001, PODS '01.

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  Helmut Seidl Deciding Equivalence of Finite Tree Automata , 1990, SIAM J. Comput..

[17]  Helmut Seidl,et al.  Haskell Overloading is DEXPTIME-Complete , 1994, Inf. Process. Lett..

[18]  Dan Suciu The XML typechecking problem , 2002, SGMD.

[19]  Frank Neven,et al.  Structured Document Transformations Based on XSL , 1999, DBPL.

[20]  Frank Neven,et al.  A formal model for an expressive fragment of XSLT , 2000, Inf. Syst..

[21]  Thomas Schwentick,et al.  Query automata over finite trees , 2002, Theor. Comput. Sci..

[22]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[23]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[24]  Dan Suciu,et al.  UnQL: a query language and algebra for semistructured data based on structural recursion , 2000, The VLDB Journal.

[25]  Yannis Papakonstantinou,et al.  Incremental validation of XML documents , 2003, TODS.

[26]  J. Hopcroft,et al.  Reasoning about Xml Schema Languages Using Formal Language Theory , 2000 .

[27]  Thomas Schwentick,et al.  XML schemas without order , 1999 .

[28]  Dan Suciu,et al.  Typechecking for XML transformers , 2000, J. Comput. Syst. Sci..

[29]  Moshe Y. Vardi A Note on the Reduction of Two-Way Automata to One-Way Automata , 1989, Inf. Process. Lett..

[30]  Noga Alon,et al.  Typechecking XML views of relational databases , 2001, Proceedings 16th Annual IEEE Symposium on Logic in Computer Science.