A Parallel XPath Engine Based on Concurrent NFA Execution

The importance of XPath in XML filtering systems has led to a significant body of research on improving the processing performance of XPath queries. Most of the work, however, has been in the context of a single processing core. Given the prevalence of multicore processors, we believe that a parallel approach can provide significant benefits for a number of application scenarios. In this paper we thus investigate the use of multiple threads to concurrently process XPath queries on a shared incoming XML document. Using an approach that builds on YFilter, we divide the NFA into several smaller ones for concurrent processing. We implement and test two strategies for load balancing: a static approach and a dynamic approach. We test our approach on an eight-core machine, and show that it provides reasonable speedup up to eight cores.

[1]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Yanlei Diao,et al.  High-Performance XML Filtering: An Overview of YFilter , 2003, IEEE Data Eng. Bull..

[3]  Wenfei Fan,et al.  Using partial evaluation in distributed query evaluation , 2006, VLDB.

[4]  Jussi Myllymaki,et al.  Implementing a scalable XML publish/subscribe system using relational database systems , 2004, SIGMOD '04.

[5]  Marcus Fontoura,et al.  Streaming XPath processing with forward and backward axes , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[7]  Oded Shmueli,et al.  Parallelization of XPath queries using multi-core processors: challenges and experiences , 2009, EDBT '09.

[8]  Yong Zhang,et al.  Exploiting Even Partition to Accelerate Structure Join , 2006, 2006 Seventh International Conference on Web-Age Information Management Workshops.

[9]  Petko Bakalov,et al.  Boosting XML Filtering with a Scalable FPGA-based Architecture , 2009, CIDR 2009.

[10]  Jianhui Li,et al.  Parallel Structural Join Algorithm on Shared-Memory Multi-Core Systems , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[11]  Takashi Honishi,et al.  Distributed XML stream filtering system with high scalability , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[13]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[14]  Christos Makris,et al.  XFIS: an XML filtering system based on string representation and matching , 2008, Int. J. Web Eng. Technol..

[15]  Sudarshan S. Chawathe,et al.  XSQ: A streaming XPath engine , 2005, TODS.

[16]  Laks V. S. Lakshmanan,et al.  On Efficient Matching of Streaming XML Documents and Queries , 2002, EDBT.

[17]  Wenfei Fan,et al.  Distributed query evaluation with performance guarantees , 2007, SIGMOD '07.

[18]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[19]  Joonho Kwon,et al.  Value-based predicate filtering of XML documents , 2008, Data Knowl. Eng..

[20]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[21]  Guoliang Li,et al.  BBTC: A New Update-Supporting Coding Scheme for XML Documents , 2005, WAIM.

[22]  Luis Gravano,et al.  Navigation- vs. index-based XML multi-query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  Jaime Prilusky,et al.  The Protein Data Bank: Current Status and Future Challenges , 1996, Journal of research of the National Institute of Standards and Technology.

[24]  Seog Park,et al.  A Keyword-Based Filtering Technique of Document-Centric XML using NFA Representation , 2007 .

[25]  Dan Suciu,et al.  Distributed query evaluation on semistructured data , 2002, TODS.