Paths Into Patterns

The XML Path Language (XPath) is an industry standard notation for addressing parts of an XML document. It is supported by many XML processing libraries and has been used as the foundation for several dedicated XML processing languages. Regular patterns, an alternative way of investigating and destructing XML documents, were first proposed in the XDuce language and feature in a number of its descendants. The processing styles offered by XPath and by regular patterns are each quite convenient for certain sorts of tasks, and the designer of a future XML processing language might well like to provide both. This designer might wonder, however, to what extent these mechanisms can be based on a common foundation. Can one be implemented by translating it into the other? Can aspects of both be combined into a single notation? As a first step toward addressing these questions, we show in this paper that a language closely related to the “downward axis” fragment of XPath can be accurately translated into ambiguous XDuce-style regular patterns with a “collect all matches” interpretation.