Efficient incremental validation of XML documents

We discuss incremental validation of XML documents with respect to DTDs and XML schema definitions. We consider insertions and deletions of subtrees, as opposed to leaf nodes only, and we also consider the validation of ID and IDREF attributes. For arbitrary schemas, we give a worst-case n log n time and linear space algorithm, and show that it often is far superior to revalidation from scratch. We present two classes of schemas, which capture most real-life DTDs, and show that they admit a logarithmic time incremental validation algorithm that, in many cases, requires only constant auxiliary space. We then discuss an implementation of these algorithms that is independent of, and can be customized for different storage mechanisms for XML. Finally, we present extensive experimental results showing that our approach is highly efficient and scalable.

[1]  Victor Vianu,et al.  Validating streaming XML documents , 2002, PODS.

[2]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[3]  Yannis Papakonstantinou,et al.  Incremental validation of XML documents , 2003, TODS.

[4]  Patrick Valduriez,et al.  Proceedings of the 2004 ACM SIGMOD international conference on Management of data , 2004, SIGMOD 2004.

[5]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[6]  Alon Y. Halevy,et al.  Updating XML , 2001, SIGMOD '01.

[7]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[8]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Anne Brüggemann-Klein,et al.  Regular Expressions into Finite Automata , 1992, Theor. Comput. Sci..

[10]  Jianwen Su,et al.  Incremental maintenance of recursive views using relational calculus/SQL , 2000, SGMD.

[11]  Byron Choi,et al.  What are real DTDs like? , 2002, WebDB.

[12]  Neil Immerman,et al.  Dyn-FO: A Parallel, Dynamic Complexity Class , 1997, J. Comput. Syst. Sci..

[13]  toExcel Extensible Markup Language (Xml) 1.0 Specifications: From the W3c Recommendations , 2000 .

[14]  M. W. Shields An Introduction to Automata Theory , 1988 .

[15]  Denilson Barbosa,et al.  The XML web: a first study , 2003, WWW '03.

[16]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[17]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[18]  Peter Bro Miltersen,et al.  Complexity Models for Incremental Computation , 1994, Theor. Comput. Sci..

[19]  Luc Segoufin,et al.  Typing and querying XML documents: some complexity bounds , 2003, PODS.

[20]  Heribert Vollmer,et al.  Introduction to Circuit Complexity , 1999, Texts in Theoretical Computer Science An EATCS Series.

[21]  Dan Suciu,et al.  Query Decomposition and View Maintenance for Query Languages for Unstructured Data , 1996, VLDB.

[22]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[23]  Neil Immerman,et al.  Dyn-FO: A Parallel, Dynamic Complexity Class , 1997, J. Comput. Syst. Sci..

[24]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[25]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..