Bounded repairability for regular tree languages

We study the problem of bounded repairability of a given restriction tree language R into a target tree language T. More precisely, we say that R is bounded repairable with respect to T if there exists a bound on the number of standard tree editing operations necessary to apply to any tree in R to obtain a tree in T. We consider a number of possible specifications for tree languages: bottom-up tree automata (on curry encoding of unranked trees) that capture the class of XML schemas and document type definitions (DTDs). We also consider a special case when the restriction language R is universal (i.e., contains all trees over a given alphabet). We give an effective characterization of bounded repairability between pairs of tree languages represented with automata. This characterization introduces two tools—synopsis trees and a coverage relation between them—allowing one to reason about tree languages that undergo a bounded number of editing operations. We then employ this characterization to provide upper bounds to the complexity of deciding bounded repairability and show that these bounds are tight. In particular, when the input tree languages are specified with arbitrary bottom-up automata, the problem is coNExp-complete. The problem remains coNExp-complete even if we use deterministic nonrecursive DTDs to specify the input languages. The complexity of the problem can be reduced if we assume that the alphabet, the set of node labels, is fixed: the problem becomes PSpace-complete for nonrecursive DTDs and coNP-complete for deterministic nonrecursive DTDs. Finally, when the restriction tree language R is universal, we show that the bounded repairability problem becomes Exp-complete if the target language is specified by an arbitrary bottom-up tree automaton and becomes tractable (P-complete, in fact) when a deterministic bottom-up automaton is used.

[1]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[2]  Jean-François Raskin,et al.  Visibly Pushdown Transducers ⋆ , 2008 .

[3]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[4]  Filip Murlak,et al.  XML schema mappings , 2009, PODS.

[5]  Victor Vianu,et al.  Validating streaming XML documents , 2002, PODS.

[6]  William Lidwell,et al.  Design rule index : デザイン、新・25+100の法則 : 125 ways to enhance usability, influence perception, increase appeal, make better design decisions, and teach through design , 2010 .

[7]  Divesh Srivastava,et al.  On Repairing Structural Problems In Semi-structured Data , 2013, Proc. VLDB Endow..

[8]  Thomas Schwentick,et al.  Automata for XML - A survey , 2007, J. Comput. Syst. Sci..

[9]  Paul J. Walmsley,et al.  XML Schema Part 0: Primer Second Edition , 2004 .

[10]  Frank Neven,et al.  Simplifying XML schema: single-type approximations of regular tree languages , 2010, J. Comput. Syst. Sci..

[11]  Michel de Rougemont,et al.  Correctors for XML Data , 2004, XSym.

[12]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[13]  Jan Chomicki,et al.  Validity-Sensitive Querying of XML Databases Extended Abstract † , 2006 .

[14]  Marcelo Arenas,et al.  XML data exchange: Consistency and query answering , 2008, J. ACM.

[15]  Helmut Seidl Deciding Equivalence of Finite Tree Automata , 1990, SIAM J. Comput..

[16]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[17]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[18]  Mikolaj Bojanczyk,et al.  Solutions in XML data exchange , 2011, ICDT '11.

[19]  Alfred V. Aho,et al.  A Minimum Distance Error-Correcting Parser for Context-Free Languages , 1972, SIAM J. Comput..

[20]  C. Babbage Passages from the Life of a Philosopher , 1968 .

[21]  Frank Neven,et al.  Generating, sampling and counting subclasses of regular tree languages , 2011, ICDT '11.

[22]  Shan Chen,et al.  An Experimental Study on Validation Problems with Existing HTML Webpages , 2005, International Conference on Internet Computing.

[23]  Cristina Sirangelo,et al.  Constant-Memory Validation of Streaming XML Documents Against DTDs , 2007, ICDT.

[24]  Thomas Schwentick,et al.  Complexity of Decision Problems for XML Schemas and Chain Regular Expressions , 2009, SIAM J. Comput..

[25]  Gabriele Puppis,et al.  Which DTDs are streaming bounded repairable? , 2013, ICDT '13.

[26]  Phokion G. Kolaitis,et al.  Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.

[27]  Joachim Niehren,et al.  Efficient inclusion checking for deterministic tree automata and XML Schemas , 2009, Inf. Comput..

[28]  Michael Benedikt,et al.  Regular Repair of Specifications , 2011, 2011 IEEE 26th Annual Symposium on Logic in Computer Science.

[29]  Sergio Greco,et al.  Querying and Repairing Inconsistent XML Data , 2005, WISE.

[30]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[31]  R. Alur,et al.  Adding nesting structure to words , 2006, JACM.

[32]  Michael Benedikt,et al.  Bounded repairability of word languages , 2013, J. Comput. Syst. Sci..

[33]  Joachim Niehren,et al.  Querying Unranked Trees with Stepwise Tree Automata , 2004, RTA.

[34]  Michael Benedikt,et al.  The per-character cost of repairing word languages , 2014, Theor. Comput. Sci..

[35]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[36]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[37]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2002, SODA '02.

[38]  Alex Thomo,et al.  Query Answering and Containment for Regular Path Queries under Distortions , 2004, FoIKS.

[39]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[40]  Scott Dick,et al.  Prevalence and classification of web page defects , 2010, Online Inf. Rev..

[41]  Wenfei Fan,et al.  Information preserving XML schema embedding , 2005, TODS.

[42]  Emmanuel Filiot,et al.  Querying regular sets of XML documents , 2008, LID 2008.

[43]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[44]  Thomas Schwentick,et al.  Expressiveness and complexity of XML Schema , 2006, TODS.

[45]  Joachim Niehren,et al.  On the minimization of XML Schemas and tree automata for unranked trees , 2007, J. Comput. Syst. Sci..

[46]  Robert A. Wagner,et al.  Order-n correction for regular languages , 1974, CACM.

[47]  Nobutaka Suzuki,et al.  Finding an optimum edit script between an XML document and a DTD , 2005, SAC '05.

[48]  Michael Benedikt,et al.  The Cost of Traveling between Languages , 2011, ICALP.

[49]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[50]  Dario Colazzo,et al.  Almost-linear inclusion for XML regular expression types , 2013, TODS.

[51]  Maarten Marx,et al.  The quality of the XML Web , 2013, J. Web Semant..