Parallel XPath query based on cost optimization

The performance of XPath query is the key factor to the capacity of XML processing. It is an important way to improve the performance of XPath by making full use of multi-threaded computing resources for parallel processing. However, in the process of XPath parallelization, load imbalance and thread inefficiency often lead to the decline of parallel performance. In this paper, we propose a cost optimization-based parallel XPath query method named coPXQ. This method improves the parallel processing effect of navigational XPath query through a series of optimization measures. The main measures include as follows: first, by optimizing the storage of XML node relation index, both storage and access efficiency of the index are improved. Secondly, load balancing is realized by a new cost estimation method according to the number of XML node relations to optimize parallel relation index creation and parallel primitive execution. Thirdly, the strategy of determining the number of worker threads based on parallel effectiveness estimation is utilized to ensure the effective use of threads in query. Compared with the existing typical methods, the experimental results show that our method can obtain better parallel performance.

[1]  José Ranilla,et al.  High-performance computing: the essential tool and the essential challenge , 2016, The Journal of Supercomputing.

[2]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[3]  Michael Allen,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .

[4]  Kyong-Ha Lee,et al.  Multi-query processing of XML data streams on multicore , 2016, The Journal of Supercomputing.

[5]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[6]  Maarten Marx,et al.  Navigational XPath: calculus and algebra , 2007, SGMD.

[7]  Lipyeow Lim,et al.  Statistics-based parallelization of XPath queries in shared memory systems , 2010, EDBT '10.

[8]  Tiezheng Nie,et al.  Algebra for Parallel XQuery Processing , 2012, WAIM Workshops.

[9]  Markus Lepper,et al.  Simple and Effective Relation-Based Approaches To XPath and XSLT Type Checking (Technical Report, Bad Honnef 2015) , 2019, ArXiv.

[10]  Klaus-Dieter Schewe,et al.  Cost-Based Vertical Fragmentation for XML , 2007, APWeb/WAIM Workshops.

[11]  Oded Shmueli,et al.  Parallelization of XPath queries using multi-core processors: challenges and experiences , 2009, EDBT '09.

[12]  Masatoshi Yoshikawa,et al.  A relative cost model for XQuery , 2007, SAC '07.

[13]  Jan Janousek,et al.  Automata Approach to XML Data Indexing , 2018, Inf..

[14]  Husheng Liao,et al.  Parallel XPath Evaluation Based on Node Relation Matrix , 2013 .

[15]  Kiminori Matsuzaki,et al.  Parallelization of XPath Queries using Modern XQuery Processors , 2018, ADBIS.

[16]  Ying Zhang,et al.  A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[17]  Jaehwan John Lee,et al.  Matrix-Based XML Stream Processing Using a GPU , 2015, 2015 IEEE International Congress on Big Data.

[18]  Chen Yongheng,et al.  Load balancing parallelizing XML query processing based on shared cache chip multi-processor (CMP) , 2011 .

[19]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[20]  Hiroyuki Kitagawa,et al.  Parallel holistic twig joins on a multi-core system , 2010, Int. J. Web Inf. Syst..

[21]  Oded Shmueli,et al.  Multi-Core Processing of XML Twig Patterns , 2015, IEEE Transactions on Knowledge and Data Engineering.

[22]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[23]  On supporting containment queries in relational database management systems , 2001 .

[24]  Bongki Moon,et al.  A Data Parallel Algorithm for XML DOM Parsing , 2009, XSym.

[25]  Zhijia Zhao,et al.  Grammar-aware Parallelization for Scalable XPath Querying , 2017, PPOPP.

[26]  Vasilis Vassalos,et al.  Efficient physical operators for cost-based XPath execution , 2010, EDBT '10.

[27]  Husheng Liao,et al.  Automatic parallelization of XQuery programs on multi-core systems , 2016, The Journal of Supercomputing.

[28]  Zongyue Wang,et al.  Pipelined XPath Query Based on Cost Optimization , 2021, Sci. Program..

[29]  Chao Wang,et al.  A Dynamic Load-balancing Scheme for XPath Queries Parallelization in Shared Memory Multi-core Systems , 2014, J. Comput..

[30]  Jon B. Weissman,et al.  Predicting the Cost and Benefit of Adapting Data Parallel Applications in Clusters , 2002, J. Parallel Distributed Comput..

[31]  Henri Casanova,et al.  Low-latency XPath Query Evaluation on Multi-Core Processors , 2017, HICSS.

[32]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[33]  Su-Cheng Haw,et al.  Improved Centralized XML Query Processing Using Distributed Query Workload , 2021, IEEE Access.

[34]  Anthony P. Reeves,et al.  Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[35]  Norman P. Jouppi,et al.  Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).