Holistic Boolean-Twig Pattern Matching for Efficient XML Query Processing

Twig pattern matching is a critical operation for XML query processing, and the holistic computing approach has shown superior performance over other methods. Since Bruno et al. introduced the first holistic twig join algorithm, TwigStack, numerous so-called holistic twig join algorithms have been proposed. Yet practical XML queries often require support for more general twig patterns, such as the ones that allow arbitrary occurrences of an arbitrary number of logical connectives (AND, OR, and NOT); such types of twigs are referred to as B-twigs (i.e., Boolean-Twigs) or AND/OR/NOT-twigs. We have seen interesting work on generalizing the holistic twig join approach to AND/OR-twigs and AND/NOT-twigs, but have not seen any further effort addressing the problem of AND/OR/NOT-Twigs at the full scale, which therefore forms the main theme of this paper. In this paper, we investigate novel mechanisms for efficient B-twig pattern matching. In particular, we introduce “B-twig normalization” as an important first-step in our approach toward eventually conquering the complexity of B-twigs, and then present BTwigMerge-the first holistic twig join algorithm designed for B-twigs. Both analytical and experimental results show that BTwigMerge is optimal for B-twig patterns with AD (Ancestor-Descendant) edges and/or PC (Parent-Child) edges.