TDX: a high-performance table-driven XML parser

This paper presents TDX, a table-driven XML parser. TDX combines parsing and validation into one pass to increase the performance of XML-based applications, such as Web services. The TDX approach is based on the observation that context-free grammars can be automatically derived from XML schema. We developed a parser construction tool to automatically construct TDX grammar productions from a schema. Grammar tokens are defined by the specific schema element names, attribute names, and text. Because most of the structural constraints in XML schema are cast as grammar rules, parsing and validation of XML instances are efficiently implemented. The results show that TDX is several times faster than DOM or SAX parsing with validation enabled.

[1]  Robert A. van Engelen,et al.  Constructing Finite State Automata for High-Performance XML Web Services , 2004, International Conference on Internet Computing.

[2]  Manish Parashar,et al.  Latency Performance of SOAP Implementations , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[3]  Kyle A. Gallivan,et al.  The gSOAP Toolkit for Web Services and Peer-to-Peer Computing Networks , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[4]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[5]  Dan Suciu,et al.  Processing XML streams with deterministic automata and stream indexes , 2004, TODS.

[6]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[7]  Bryan Ford,et al.  Packet parsing : a practical linear-time algorithm with backtracking , 2002 .

[8]  Madhusudhan Govindaraju,et al.  Investigating the limits of SOAP performance for scientific computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[9]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[10]  Welf Löwe,et al.  Foundations of Fast Communication via XML , 2002, Ann. Softw. Eng..

[11]  Robert A. van Engelen Code generation techniques for developing light-weight XML Web services for embedded devices , 2004, SAC '04.

[12]  Frank Tsui,et al.  Essentials of software engineering , 2006 .

[13]  Tony Mason,et al.  Lex & Yacc , 1992 .

[14]  Gunjan Gupta,et al.  Developing Web Services for C and C++ , 2003, IEEE Internet Comput..

[15]  Kenneth Chiu,et al.  A Compiler-Based Approach to Schema-Specific XML Parsing , 2003 .

[16]  J. Hopcroft,et al.  Reasoning about Xml Schema Languages Using Formal Language Theory , 2000 .

[17]  Robert Steele,et al.  Evaluating SOAP for High Performance Business Applications: Real-Time Trading Systems , 2003, WWW.

[18]  Richard Tobin,et al.  Using Finite State Automata to Implement W3C XML Schema Content Model Validation and Restriction Checking , 2003 .