Benefits of alternate XML serialization formats in scientific computing

XML is a widely-used technology for interoperable data representation, and its scope of usage has widened even more in recent years. However, this expansion of XML's application areas has identified limitations and inefficiencies that seem inherent in XML due to its verbosity and redundancy. Because of this, various industry groups and standardization organizations have undertaken to define alternate representations of XML data to better address their needs while still retaining compatibility with XML. This paper provides an overview of the arguments in favor of a binary format in scientific computing, of work done in this area by the W3C, and some benchmarks comparing XML with various processing techniques available with binary formats.

[1]  Jussi Myllymaki,et al.  An evaluation of binary xml encoding optimizations for fast stream based xml processing , 2004, WWW '04.

[2]  Michael J. Lewis,et al.  Differential Deserialization for Optimized SOAP Performance , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[3]  P. T. Barry,et al.  Abstract syntax notation-one (ASN.1) , 1992 .

[4]  Rudolf Schmid,et al.  Organization for the advancement of structured information standards , 2002 .

[5]  Madhusudhan Govindaraju,et al.  Investigating the limits of SOAP performance for scientific computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[6]  Christian Werner,et al.  Compressing SOAP Messages by using Pushdown Automata , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[7]  James Cheney Compressing XML with multiplexed hierarchical PPM models , 2001, Proceedings DCC 2001. Data Compression Conference.

[8]  Erik Wilde Position Paper for the W3C Workshop on Binary Interchange of XML Information Item Sets , 2003 .

[9]  Wei Zhang,et al.  A Table-Driven Streaming XML Parsing Methodology for High-Performance Web Services , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[10]  Kenneth Chiu,et al.  A Compiler-Based Approach to Schema-Specific XML Parsing , 2003 .

[11]  Nathaniel S. Borenstein,et al.  Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies , 1996, RFC.

[12]  Sasu Tarkoma,et al.  Xebu: A Binary Format with Schema-Based Optimizations for XML Data , 2005, WISE.

[13]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD '00.

[14]  D. Eastlake,et al.  XML Encryption Syntax and Processing , 2003 .

[15]  Richard W. Watson,et al.  Gaining efficiency in transport services by appropriate design and implementation choices , 1987, TOCS.

[16]  Wei Lu,et al.  A binary XML for scientific applications , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[17]  Philippe Salembier,et al.  MPEG-7 Systems: overview , 2001, IEEE Trans. Circuits Syst. Video Technol..

[18]  Michiaki Tatsubori,et al.  An adaptive, fast, and safe XML parser based on byte sequences memorization , 2005, WWW '05.

[19]  Abraham Heifets,et al.  XML screamer: an integrated approach to high performance XML parsing, validation and deserialization , 2006, WWW '06.

[20]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[21]  Peter Deutsch,et al.  GZIP file format specification version 4.3 , 1996, RFC.