Multi-Core Processing of XML Twig Patterns

XML is based on a tree-structured data model. Naturally, the most popular XML querying language (XPath) uses patterns of selection predicates, on multiple elements related by a tree structure, which often may be abstracted by twig patterns. Finding all occurrences of such a twig pattern in an XML database is a basic operation for XML query processing. We present the parallel path stack algorithm (PPS) and the parallel twig stack algorithm (PTS). PPS and PTS are novel and efficient algorithms for matching XML query twig patterns in a parallel multi-threaded computing platform. PPS and PTS are based on the PathStack and TwigStack algorithms [1]. These algorithms employ a sophisticated search technique for limiting processing to specific subtrees. We conducted extensive experimentation with PPS and PTS. We compared PPS and PTS to the standard (sequential) PathStack and TwigStack algorithms in terms of run time (to completion). We checked their performance for varying numbers of threads. Experimental results indicate that using PPS and PTS significantly reduces the running time of queries in comparison with the PathStack/TwigStack algorithm (up to 44 times faster for DBLP queries and up to 22 times faster for XMark queries).

[1]  Wenfei Fan,et al.  Distributed query evaluation with performance guarantees , 2007, SIGMOD '07.

[2]  Oded Shmueli,et al.  Parallelization of XPath queries using multi-core processors: challenges and experiences , 2009, EDBT '09.

[3]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[4]  Hiroyuki Kitagawa,et al.  Parallel holistic twig joins on a multi-core system , 2010, Int. J. Web Inf. Syst..

[5]  Wei Lu,et al.  A Parallel Approach to XML Parsing , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[6]  Ying Zhang,et al.  Parsing XML using parallel traversal of streaming trees , 2008, HiPC'08.

[7]  Wei Lu,et al.  Parallel XML processing by work stealing , 2007, SOCP '07.

[8]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[9]  Lipyeow Lim,et al.  Statistics-based parallelization of XPath queries in shared memory systems , 2010, EDBT '10.

[10]  Jianhui Li,et al.  An Efficient Parallel PathStack Algorithm for Processing XML Twig Queries on Multi-core Systems , 2010, DASFAA.

[11]  Hiroyuki Kitagawa,et al.  XML data partitioning strategies to improve parallelism in parallel holistic twig joins , 2009, ICUIMC '09.

[12]  Jianhui Li,et al.  Parallel Structural Join Algorithm on Shared-Memory Multi-Core Systems , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[13]  Wenfei Fan,et al.  Using partial evaluation in distributed query evaluation , 2006, VLDB.

[14]  Chen Yongheng,et al.  Load balancing parallelizing XML query processing based on shared cache chip multi-processor (CMP) , 2011 .

[15]  Kam-Fai Wong,et al.  WIN: an efficient data placement strategy for parallel XML databases , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[16]  Ying Zhang,et al.  Parallel XML Parsing Using Meta-DFAs , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[17]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[18]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[19]  Wei Lu,et al.  ParaXML : A Parallel XML Processing Model on the Multicore CPUs , 2007 .

[20]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[21]  Hiroyuki Kitagawa,et al.  Executing parallel TwigStack algorithm on a multi-core system , 2009, iiWAS.