ParaParse: A parallel method for XML parsing

Full manipulation of XML data has to rely on parsing process before hand. XML parsing is CPU intensive and tends to greatly affect the performance of XML application in a general way. Moreover, parallel computing is widely introduced to solve practical problems due to the popularization of multi-core computers. It is a natural and promising way to parallelize parsing process. Existing parallel parsing methods need pre-parsing stage to get proper data partitions. Unfortunately pre-parsing is often time consuming and difficult to be optimized. ParaParse presented in this paper is a novel parallel method for XML parsing. It has a rather light weighted data partition way and supports parsing arbitrarily partitioned XML segments in parallel. After that subtree merging is carried out to generate global XML tree. The parsing result can be further wrapped for sophisticated XML query. Experiment results show that ParaParse is suited to multi-core environment to realize parallel XML parsing.

[1]  XML parsing: a threat to database performance , 2003, CIKM '03.

[2]  Sujatha Kashyap,et al.  Improving Database Performance With AIX Concurrent I / O , 2003 .

[3]  Kevin Lü,et al.  Parallel processing XML documents , 2002, Proceedings International Database Engineering and Applications Symposium.

[4]  Ying Zhang,et al.  Simultaneous transducers for data-parallel XML parsing , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[5]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[6]  Wei Lu,et al.  A Parallel Approach to XML Parsing , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[7]  Ying Zhang,et al.  Parallel XML Parsing Using Meta-DFAs , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[8]  Ying Zhang,et al.  Hybrid Parallelism for XML SAX Parsing , 2008, 2008 IEEE International Conference on Web Services.

[9]  Jason Roberts,et al.  Increasing Performance through Software Multi-threading , 2006 .

[10]  Ying Zhang,et al.  A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).