Distributed Processing of XPath Queries Using MapReduce

In this paper we investigate the problem of efficiently evaluating XPath queries over large XML data stored in a distributed manner. We propose a MapReduce algorithm based on a query decomposition which computes all expected answers in one MapReduce step. The algorithm can be applied over large XML data which is given either as a single distributed document or as a collection of small XML documents.