Hybrid Parallelism for XML SAX Parsing

XML has been widely adopted across a wide spectrum of applications. Its parsing efficiency, however, remains a concern, and can be a bottleneck. At the same time, with the trend towards multicore CPUs, parallelization to improve performance has become increasingly relevant. In previous work, we have investigated parallelizing DOM-style parsing and gained significant speedup. For streaming XML applications, however, SAX-style parsing is often required. In this paper, we present a technique and implementation of a parallel XML SAX parser. To handle inherent data dependencies in XML while still allowing reasonable scalability, we use a 4-stage software pipeline with a combination of strictly sequential stages and stages that can be further data-parallelized within the stage. We thus utilize a hybrid between pipelined parallelism and data parallelism. To demonstrate effectiveness, we test this approach on a Linux machine with two Intel Xeon L5320 CPUs for a total of 8 physical cores, and obtain good speedup up to about 8 CPUs.

[1]  Wei Lu,et al.  A binary XML for scientific applications , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[2]  Wei Zhang,et al.  A Table-Driven Streaming XML Parsing Methodology for High-Performance Web Services , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[3]  Jaime Prilusky,et al.  The Protein Data Bank: Current Status and Future Challenges , 1996, Journal of research of the National Institute of Standards and Technology.

[4]  Robert A. van Engelen,et al.  Constructing Finite State Automata for High-Performance XML Web Services , 2004, International Conference on Internet Computing.

[5]  S. Kontogiannis,et al.  A Frame for the Simulation of Shared Memory on Distributed Memory , 2007 .

[6]  Michiaki Tatsubori,et al.  An adaptive, fast, and safe XML parser based on byte sequences memorization , 2005, WWW '05.

[7]  Bryan Ford,et al.  Packet parsing : a practical linear-time algorithm with backtracking , 2002 .

[8]  Wei Zhang,et al.  Benchmarking XML Processors for Applications in Grid Web Services , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[9]  Madhusudhan Govindaraju,et al.  Investigating the limits of SOAP performance for scientific computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[10]  Ying Zhang,et al.  A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[11]  G.Z. Qadah,et al.  Parallel processing of XML databases , 2005, Canadian Conference on Electrical and Computer Engineering, 2005..

[12]  Wei Lu,et al.  A Parallel Approach to XML Parsing , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[13]  Kam-Fai Wong,et al.  WIN: an efficient data placement strategy for parallel XML databases , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[14]  Abraham Heifets,et al.  XML screamer: an integrated approach to high performance XML parsing, validation and deserialization , 2006, WWW '06.

[15]  Suchendra M. Bhandarkar,et al.  Parallel parsing of MPEG video on a shared-memory symmetric multiprocessor , 2004, Parallel Comput..

[16]  J. V. Lunteren,et al.  XML Accelerator Engine , 2004 .

[17]  Kenneth Chiu,et al.  A Compiler-Based Approach to Schema-Specific Parsers for XML Indiana University Tech Report No. 592 , 2004 .

[18]  Ying Zhang,et al.  Parallel XML Parsing Using Meta-DFAs , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[19]  Ying Zhang,et al.  Simultaneous transducers for data-parallel XML parsing , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[20]  Marcel P. van Lohuizen,et al.  Parallel processing of natural language parsers , 2000, PARCO.

[21]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .