A Dynamic Load-balancing Scheme for XPath Queries Parallelization in Shared Memory Multi-core Systems

Due to the rapid popularity of multi-core processors systems, the parallelization of XPath queries in shared memory multi-core systems has been studied gradually. Existing work developed some parallelization methods based on cost estimation and static mapping, which could be seen as a logical optimization of parallel query plan. However, static mapping may result in load imbalance that hurts the overall performance, especially when nodes in XML are not evenly distributed. In this paper, we solve the problem from another view using parallelizing techniques. We use dynamic mapping to improve XPath query performance, which can achieve better load balance no matter what XML document is queried. Compared with static mapping, dynamic mapping is a more general method. We first design a parallel XPath query algebra called PXQA (ParallelXPath Query Algebra) to explain the parallel query plan. And second, using PXQA we extract the task-dependence graph to define which operations can be executed in parallel and help analyze the overheads of dynamic mapping. At last, we discuss how to do the data partition based on dynamic mapping in accordance with the runtime situations adaptively. Experimental results show that the adaptive runtime XPath queries parallelization achieves a good performance in shared memory multi-core systems.

[1]  Lipyeow Lim,et al.  Statistics-based parallelization of XPath queries in shared memory systems , 2010, EDBT '10.

[2]  Jianhui Li,et al.  An Efficient Parallel PathStack Algorithm for Processing XML Twig Queries on Multi-core Systems , 2010, DASFAA.

[3]  Oded Shmueli,et al.  Parallelization of XPath queries using multi-core processors: challenges and experiences , 2009, EDBT '09.

[4]  Jianhui Li,et al.  Parallel Structural Join Algorithm on Shared-Memory Multi-Core Systems , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[5]  Hiroyuki Kitagawa,et al.  XML data partitioning strategies to improve parallelism in parallel holistic twig joins , 2009, ICUIMC '09.

[6]  Ling Chen,et al.  A Scalable XSLT Processing Framework based on MapReduce , 2013, J. Comput..

[7]  Sherif Sakr,et al.  Dependable cardinality forecasts for XQuery , 2008, Proc. VLDB Endow..

[8]  Ying Zhang,et al.  A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[9]  Christopher League,et al.  Schema-Based Compression of XML Data with Relax NG , 2007, J. Comput..

[10]  Hiroyuki Kitagawa,et al.  XML data partitioning schemes for parallel holistic twig joins , 2009, Int. J. Web Inf. Syst..

[11]  Weifeng Shan,et al.  Automatic Parallelization of XQuery Programs , 2013, J. Softw..

[12]  Wei Lu,et al.  A Parallel Approach to XML Parsing , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[13]  Hiroyuki Kitagawa,et al.  GMX: an XML data partitioning scheme for holistic twig joins , 2008, iiWAS.

[14]  Xiangyu Hu,et al.  DPIX: A Dynamic Path Index for XML Data in Relational Database , 2013 .

[15]  Wei Lu,et al.  Parallel XML processing by work stealing , 2007, SOCP '07.