Parallel XML processing by work stealing

A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has been regarded as the performance bottleneck in most systems and applications. On the other side, the multicore processor, emerged as a solution for the clock-speed limitation of the modern CPUs, has been growingly prevalent. Leveraging the parallelism provided by the multicorere source to speedup the software execution is becoming the trend of the software development. In this paper, we present a parallel processing model for the XML document. The model is not designed just for a specific XML processing task, instead, it is a general model, by which we are able to explore various parallel XML document processing. The kernel of the model is a stealing-based dynamic load-balancing mechanism, called ThreadCrew, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution. The model also provides a novel mechanism to trace the stealing actions, thus the equivalent sequential result can be gotten by gluing the multiple parallel-running results together. To show the feasibility and effectiveness of our approaches, we present our C# implementation of parallel XML serialization in this paper. Our empirical study shows our parallel XML serialization algorithm can improved the XML serializing performance significantly on a multicore machine.

[1]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[2]  Masakazu Furuichi,et al.  A multi-level load balancing scheme for OR-parallel exhaustive search programs on the multi-PSI , 1990, PPOPP '90.

[3]  Nir Shavit,et al.  Parallel Garbage Collection for Shared Memory Multiprocessors , 2001, Java Virtual Machine Research and Technology Symposium.

[4]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Welf Löwe,et al.  Foundations of Fast Communication via XML , 2002, Ann. Softw. Eng..

[6]  Kenneth Chiu,et al.  A Compiler-Based Approach to Schema-Specific XML Parsing , 2003 .

[7]  Chris Wilson,et al.  Document Object Model (DOM) Level 1 Specification (Second Edition) , 2000 .

[8]  Jaime Prilusky,et al.  The Protein Data Bank: Current Status and Future Challenges , 1996, Journal of research of the National Institute of Standards and Technology.

[9]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[10]  Robert A. van Engelen,et al.  Constructing Finite State Automata for High-Performance XML Web Services , 2004, International Conference on Internet Computing.

[11]  Vipin Kumar,et al.  Scalable Load Balancing Techniques for Parallel Computers , 1994, J. Parallel Distributed Comput..

[12]  Alexander Reinefeld,et al.  Scalability of Massively Parallel Depth-First Search , 1994 .

[13]  Yanlei Diao,et al.  Yfilter: Efficient and scalable of xml document , 2002 .

[14]  Madhusudhan Govindaraju,et al.  Investigating the limits of SOAP performance for scientific computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[15]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[16]  Jeffrey Richter,et al.  CLR via C , 2006 .

[17]  Wei Lu,et al.  A streaming validation model for SOAP digital signature , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[18]  Vipin Kumar,et al.  State of the Art in Parallel Search Techniques for Discrete Optimization Problems , 1999, IEEE Trans. Knowl. Data Eng..