Efficient String-Based XML Stream Prefiltering

Whenever huge XML documents have to be evaluated according to a given XPath or XQuery query, parsing the whole document in form of e. g. SAX events is the baseline that is common to all evaluators. But typically only few parts of the document are really relevant and can contribute to the query evaluation. We propose an approach to String-based prefiltering of an XML document D that outputs a smaller document D' that contains the relevant parts of the document, such that the query Q evaluated on D yields the same result as Q evaluated on D'. In contrast to previous approaches, our approach extends the idea of efficient String-based XML prefiltering with support for XML Schema instead of DTDs, recursive schemata, and attribute filters. Our experiments on a 1 GB XMark document, taking the average over 22 queries, have shown that our approach outperforms previous prefiltering approaches and that it reaches an average speed-up factor of 8 compared to XQuery evaluation without prefiltering.